Compare commits
20 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| d7abe8b9cb | |||
| 50ea57d8c1 | |||
| 9a3be6f778 | |||
| f321dabce3 | |||
| 56d7217668 | |||
| da5b19c159 | |||
| 86ae142dfe | |||
| 70cd530578 | |||
| db7a47012e | |||
| a9bba42c3d | |||
| f8ab522bef | |||
| 98daebd359 | |||
| a468c3d2ed | |||
| 7213a1e2fd | |||
| fd327107df | |||
| 0d3d190a8b | |||
| c065a2dea7 | |||
| 24945b1eb2 | |||
| c0c2e4bff5 | |||
| f3c70bd802 |
@@ -364,9 +364,12 @@ gnoma can run a tiny local model alongside the main provider to:
|
||||
|
||||
```toml
|
||||
[slm]
|
||||
enabled = true
|
||||
backend = "auto" # ollama | llamacpp | llamafile | openaicompat | auto | disabled
|
||||
model = "reecdev/tiny3.5:500m"
|
||||
enabled = true
|
||||
backend = "auto" # ollama | llamacpp | llamafile | openaicompat | auto | disabled
|
||||
model = "qwen3:0.6b"
|
||||
register_as_arm = true # default; set to false to make the SLM classifier-only
|
||||
# (e.g. for FunctionGemma, code-completion-tuned models)
|
||||
classify_timeout = "15s" # default; bump higher for slow cold-loads
|
||||
```
|
||||
|
||||
Setup, presets, and verification: [docs/slm-backends.md](docs/slm-backends.md).
|
||||
@@ -491,6 +494,14 @@ keeps incognito-mode data out of long-lived stores.
|
||||
> prompts and tool data are sent to that provider as required to
|
||||
> fulfill the request — by design. For fully on-device operation,
|
||||
> use Ollama or llama.cpp and `--incognito`.
|
||||
>
|
||||
> **Project registry.** gnoma writes a list of directories you've
|
||||
> launched it from to `~/.config/gnoma/projects.json` (one entry per
|
||||
> project, with first/last-seen timestamps and a session count). The
|
||||
> file is purely local — never read by anything outside gnoma, never
|
||||
> transmitted. It powers `gnoma doctor --all-projects`,
|
||||
> `gnoma upgrade-config --all`, and the cross-project session picker.
|
||||
> Opt out with `[config].project_registry = false` in your config.
|
||||
|
||||
### Entropy false-positive reduction
|
||||
|
||||
@@ -570,9 +581,22 @@ Architecture, conventions, and TDD workflow: [CONTRIBUTING.md](CONTRIBUTING.md).
|
||||
|
||||
## About
|
||||
|
||||
### Origin
|
||||
|
||||
gnoma started as a **provider-agnostic coding CLI** — the bandit router and
|
||||
multi-provider arm system were the original substance. Building it made the
|
||||
security gap in existing AI tools obvious: most assume the agent runtime,
|
||||
the model provider, and every MCP server in the chain is trusted, then add
|
||||
telemetry on top. The security boundaries gnoma ships are the answer to what
|
||||
was missing, not the goal it set out with.
|
||||
|
||||
### Naming
|
||||
|
||||
Named after the northern pygmy-owl (*Glaucidium gnoma*); agents are called
|
||||
**elfs** (elf owl).
|
||||
|
||||
### Repositories
|
||||
|
||||
- **Upstream:** <https://somegit.dev/Owlibou/gnoma>
|
||||
- **GitHub mirror:** <https://github.com/VikingOwl91/gnoma> (read-only;
|
||||
PRs go to upstream Gitea)
|
||||
|
||||
@@ -4,6 +4,128 @@ Active work, newest first.
|
||||
|
||||
## In flight
|
||||
|
||||
- **TUI/UX refresh — opencode-inspired patterns.** Gap-closing pass over
|
||||
the existing Bubble Tea TUI (`internal/tui/*`), borrowing proven UX
|
||||
patterns from opencode and two layout *concepts* from opentui
|
||||
(re-implemented in Go — opentui is Zig+TS, not consumable here). Items:
|
||||
a labelled plan/build mode toggle over the existing permission-mode
|
||||
cycle (`app.go:643-668`), a leader-key command palette routing to the
|
||||
current pickers, external theme files (`~/.config/gnoma/themes/`),
|
||||
syntax-aware diff rendering for `fs.edit` results, a `/sessions`
|
||||
picker + transcript `/export` (no server — local only), and a small
|
||||
declarative layout helper. Plan:
|
||||
[`docs/superpowers/plans/2026-06-04-tui-ux-opencode.md`](docs/superpowers/plans/2026-06-04-tui-ux-opencode.md).
|
||||
|
||||
- **Multi-Agent Engineering Forge (MAEF) — `gnoma forge`.** Deterministic
|
||||
pipeline orchestrator: Context Planner → Forge → Sandbox gate →
|
||||
Cross-Vendor Critic, with programmatic loop-back gates. Maps onto
|
||||
existing machinery — the orchestrator is a Go state machine
|
||||
(`internal/forge`), the three LLM stages are elfs
|
||||
(`elf.Manager.Spawn`/`SpawnWithProvider`), the Sandbox gate is a
|
||||
**non-LLM** Go function over a new `internal/sandbox` (git-worktree
|
||||
default, docker optional behind one interface). Forge emits unified
|
||||
diffs applied via `git apply` (not `fs.edit`); the Critic is pinned to
|
||||
a different vendor/arm than the Forge via `router.ForceArm`. Terminal
|
||||
state-sync failures revert the worktree (no infinite loop). All
|
||||
firewall/audit/egress/CWD boundaries apply per stage. Plan:
|
||||
[`docs/superpowers/plans/2026-06-04-multi-agent-engineering-forge.md`](docs/superpowers/plans/2026-06-04-multi-agent-engineering-forge.md).
|
||||
|
||||
- **models.dev as source of truth for model specs & pricing.** Adopt
|
||||
models.dev (`api.json`) for objective facts — context window, max
|
||||
output, modalities, tool-use, reasoning, **price** — feeding
|
||||
`provider.Capabilities` and the currently-mostly-empty
|
||||
`Arm.CostPer1k{Input,Output}` (`router.go:393,418` seam). Subjective
|
||||
routing policy (`MaxComplexity`/`Strengths`/`CostWeight`/`SizeCaps` in
|
||||
`internal/router/defaults.go`) stays hand-curated — augment, don't
|
||||
replace. Offline-first: a `//go:embed` snapshot ships in the binary;
|
||||
`gnoma models refresh` is opt-in. **Configurable display currency**
|
||||
(USD/EUR/…) with a daily best-effort FX rate fetched on launch and
|
||||
cached; disable → USD (models.dev native). Per-arm price overrides via
|
||||
`[[provider.cost]]` (incl. `billing="subscription"`, intersects the
|
||||
MiniMax plan). `models.dev` + the FX source join the egress allowlist.
|
||||
Plan:
|
||||
[`docs/superpowers/plans/2026-06-04-models-dev-source-of-truth.md`](docs/superpowers/plans/2026-06-04-models-dev-source-of-truth.md).
|
||||
|
||||
- **MiniMax provider — cloud arm + subscription token plan.** Add
|
||||
MiniMax (api.minimax.io / api.minimaxi.com) as a first-class cloud
|
||||
provider so it can register as a router arm alongside
|
||||
anthropic/openai/google/mistral.
|
||||
|
||||
**API surface.** MiniMax ships *two* OpenAI-and-Anthropic-compatible
|
||||
HTTP surfaces, so this is a base-URL + auth wiring task, not a new
|
||||
translation layer:
|
||||
- **OpenAI-compatible** chat-completions at `…/v1` — reusable via
|
||||
`internal/provider/openaicompat`. Cleanest first cut: add a
|
||||
`NewMiniMax(cfg)` constructor mirroring `NewOllama` /
|
||||
`NewLlamaCpp` (`openaicompat/provider.go`) with the MiniMax base
|
||||
URL baked in, then a `case "minimax"` in
|
||||
`createProvider` (`cmd/gnoma/main.go:1265`) and the available-
|
||||
providers usage string (`:1279`).
|
||||
- **Anthropic-compatible** endpoint (`…/anthropic`) — alternative
|
||||
backing via the existing `anthropic` provider with a `BaseURL`
|
||||
override. Decide one canonical path; OpenAI-compat is the lower-
|
||||
risk default since `openaicompat` is already exercised by the
|
||||
local backends.
|
||||
- **Auth.** Bearer API key. `envKeyFor`'s default branch
|
||||
(`main.go:1199`) already resolves `MINIMAX_API_KEY` with no code
|
||||
change; add an explicit `case "minimax"` only if we want a
|
||||
friendlier name or alternates list.
|
||||
- **Models.** `MiniMax-M2` (agentic/coding, the one to default to),
|
||||
`MiniMax-M1`, abab6.5 series. Set `Strengths` + `MaxComplexity`
|
||||
+ `CostWeight` on the arm so the selector treats it as a cheap
|
||||
high-capability cloud tier.
|
||||
|
||||
**Token plan (open question — affects auth + billing UX).** MiniMax
|
||||
offers a flat-rate **Coding Plan** subscription (token-quota based,
|
||||
Claude-Max-style) *in addition to* metered pay-as-you-go API
|
||||
credits. Both authenticate with the same Bearer key, so no adapter
|
||||
difference — but the router's `CostWeight` math assumes metered
|
||||
per-token pricing. Under a subscription the marginal cost is ~0
|
||||
until the quota is hit, then hard-stops. Decisions to make:
|
||||
- How to model "subscription" cost in the selector — e.g. a
|
||||
`[provider.minimax].billing = "subscription" | "metered"` knob
|
||||
that zeroes `CostWeight` while quota remains, vs. real per-token
|
||||
cost when metered.
|
||||
- Quota exhaustion handling — surface the 429/quota error cleanly
|
||||
and let the bandit fail over to the next arm (ties into the
|
||||
session error-recovery work in `0d3d190`).
|
||||
- Document both plans + the region split (`api.minimax.io`
|
||||
international vs `api.minimaxi.com`) in `docs/slm-backends.md` /
|
||||
provider docs.
|
||||
|
||||
Smallest shippable slice: OpenAI-compat `NewMiniMax` + metered
|
||||
pricing, registered as a cloud arm. Subscription/quota modelling is
|
||||
the follow-up once the billing knob lands. Plan:
|
||||
[`docs/superpowers/plans/2026-06-04-minimax-provider.md`](docs/superpowers/plans/2026-06-04-minimax-provider.md).
|
||||
|
||||
- **Agent Client Protocol (ACP) support.** Run gnoma as an *ACP agent*
|
||||
(`gnoma acp`) so any ACP-capable editor (Zed, Kiro, OpenCode, …) can
|
||||
drive it as an external coding agent. ACP is "the LSP for AI coding
|
||||
agents": JSON-RPC 2.0 over stdio, editor (client) spawns agent
|
||||
(subprocess). gnoma already owns the hard parts — agentic engine,
|
||||
tools, permissions, and JSON-RPC-over-stdio (from its MCP-client
|
||||
side, `internal/mcp/jsonrpc.go`). The fit is symmetric: gnoma is the
|
||||
JSON-RPC *server* here. No Go SDK exists (official SDKs are
|
||||
TS/Python/Rust/Kotlin), so gnoma implements the wire protocol
|
||||
natively against the schema. `session/new` can declare `mcpServers`,
|
||||
so ACP and gnoma's existing MCP manager wire up in one handshake.
|
||||
|
||||
**Dual role — both directions:**
|
||||
1. **gnoma as ACP agent (server)** — `gnoma acp` over stdio so
|
||||
editors drive gnoma.
|
||||
2. **gnoma as ACP client** — gnoma spawns *external* ACP agents
|
||||
(Claude, Gemini CLI, Codex, …) and uses them as router-arm
|
||||
provider backends. This is the same shape as the existing
|
||||
`internal/provider/subprocess` CLI-agent arms
|
||||
(`cmd/gnoma/main.go:521-531`, `IsCLIAgent: true`) but over
|
||||
standardized ACP JSON-RPC — gaining structured tool-call
|
||||
surfacing, real turn/permission semantics, and cancellation
|
||||
that the current one-shot stream-json subprocess provider
|
||||
lacks (it sets `ToolUse:false` for agents without stream-json).
|
||||
|
||||
Upstream: <https://github.com/agentclientprotocol>. Plan:
|
||||
[`docs/superpowers/plans/2026-06-04-agent-client-protocol.md`](docs/superpowers/plans/2026-06-04-agent-client-protocol.md).
|
||||
|
||||
- **Config write/merge — silent corruption of layered configs.**
|
||||
`internal/config/write.go:setConfig` reads the existing TOML into a
|
||||
zero-valued `Config` struct, sets one field, and writes the entire
|
||||
@@ -146,7 +268,10 @@ Active work, newest first.
|
||||
decision in #1.
|
||||
|
||||
Surfaced from the r/coolgithubprojects v0.3.1 launch thread
|
||||
(2026-05-24, `u/Ha_Deal_5079`).
|
||||
(2026-05-24, `u/Ha_Deal_5079`). The encoder + contextual bandit
|
||||
alternative is now sketched in
|
||||
[`docs/superpowers/plans/2026-05-25-encoder-bandit-router.md`](docs/superpowers/plans/2026-05-25-encoder-bandit-router.md) —
|
||||
that plan supersedes #1 above when it ships.
|
||||
|
||||
- **Security boundary — egress controls + session audit log.** The
|
||||
current `Firewall` is a content boundary only (scans messages and
|
||||
@@ -156,18 +281,98 @@ Active work, newest first.
|
||||
with no per-host allowlist or dial-layer interception. Two follow-
|
||||
ups surfaced from the r/SideProject v0.3.0 launch thread
|
||||
(2026-05-24, `u/Secret_Theme3192`):
|
||||
1. **Per-session audit log of blocked/redacted events** —
|
||||
grep-able file at `.gnoma/sessions/<id>/audit.jsonl` so the
|
||||
user can answer "what did the firewall do this session?" in
|
||||
one command. Today the `slog` output goes to whatever sink is
|
||||
configured, with no per-session grouping.
|
||||
2. **Per-host egress allowlist (HTTP transport layer)** — open
|
||||
design question: host-level (`allow api.openai.com, deny *`)
|
||||
vs per-tool (`bash can only hit these hosts`). Reply asked
|
||||
the commenter for their mental model; revisit when feedback
|
||||
lands. The README and v0.3.0 Reddit post phrasing oversold
|
||||
"network egress gated"; corrected in the same commit as this
|
||||
TODO entry.
|
||||
1. **Per-session audit log of blocked/redacted events** — ✅ JSONL
|
||||
writing **implemented**: `internal/security/audit.go` +
|
||||
wiring at `cmd/gnoma/main.go:685-691`
|
||||
(`.gnoma/sessions/<id>/audit.jsonl`), recorded from
|
||||
`firewall.go:152/173/186`. **Remaining gap:** no CLI to *read*
|
||||
it — a `gnoma firewall audit` viewer is folded into the egress
|
||||
plan (shares the `gnoma firewall` command surface).
|
||||
2. **Per-host egress allowlist (HTTP transport layer)** — design
|
||||
refined by `u/HarjjotSinghh` on the r/SideProject thread
|
||||
(2026-05-28). Three-stage rollout, not a single-shot
|
||||
"block everything except X" default:
|
||||
- **Learn.** First run logs every egress destination per
|
||||
(project, agent, tool) tuple without blocking.
|
||||
- **Review.** New `gnoma firewall review` subcommand surfaces
|
||||
the captured set; user marks each destination as
|
||||
allow / deny / scoped.
|
||||
- **Enforce.** Subsequent runs block unrecognised destinations
|
||||
with a clear violation log (lives alongside the per-session
|
||||
audit log from item #1).
|
||||
|
||||
Default baseline destinations (curated, ship-in-the-binary):
|
||||
- **Package ecosystems:** github.com, npm registry,
|
||||
pypi.org, crates.io, docker hub, golang.org/proxy.golang.org.
|
||||
- **Model providers:** anthropic, openai, google, mistral —
|
||||
plus user-configured local ollama / llamacpp endpoints
|
||||
read from `[provider.endpoints]`.
|
||||
|
||||
The painful middle ground is SDK egress (sentry, stripe,
|
||||
supabase, datadog, …) — these break a "block unknown"
|
||||
default fast, which is why the Learn → Review → Enforce
|
||||
flow is the only thing that scales. Per-tool scoping
|
||||
(`bash` can only reach hosts X, MCP server Y can only reach
|
||||
hosts Z) is the layer above the project-wide allowlist.
|
||||
|
||||
The README and v0.3.0 Reddit post phrasing oversold
|
||||
"network egress gated"; corrected in the README scope note
|
||||
and the audit-log commit.
|
||||
|
||||
Egress plan (incl. the `gnoma firewall audit` viewer for item #1):
|
||||
[`docs/superpowers/plans/2026-06-04-egress-allowlist.md`](docs/superpowers/plans/2026-06-04-egress-allowlist.md).
|
||||
|
||||
- **Cross-platform support — Windows + macOS.** GoReleaser builds
|
||||
static binaries for `linux/darwin/windows × amd64/arm64` every
|
||||
release but only Linux is exercised at all today. Windows and
|
||||
macOS binaries ship untested. Surfaced 2026-05-28 (r/SideProject
|
||||
reply to `u/HarjjotSinghh`) — answered "yes Windows builds ship"
|
||||
but honestly couldn't claim they're tested. His framing was
|
||||
specifically that the `r/devops` audience will surface predictable
|
||||
questions "within a week" — list below maps each question to the
|
||||
underlying gnoma-side gap.
|
||||
|
||||
### Phase 1 — smoke tests (unblock the honest answer)
|
||||
|
||||
Non-blocking GitHub Actions matrix job per tag: pull each release
|
||||
archive, run `gnoma --version && echo hi | gnoma --provider
|
||||
ollama` against a stub provider. Confirms the binary executes and
|
||||
the TUI doesn't crash before any real bug-hunt starts.
|
||||
|
||||
### Phase 2 — Windows-specific concerns (r/devops question pattern)
|
||||
|
||||
Each row is an expected r/devops question, the gnoma-side gap it
|
||||
exposes, and the rough fix scope. Order roughly by "how soon would
|
||||
this come up in a thread":
|
||||
|
||||
| Question | Gap | Fix scope |
|
||||
|---|---|---|
|
||||
| "Does it work in PowerShell?" | Shell quoting in `internal/tool/bash` assumes POSIX; ANSI escape handling not tested against PowerShell + Windows Terminal | Add a PowerShell quoter (Quote a la `Get-Process "$arg"` rules); test ANSI emission against `Out-Host` and legacy `conhost.exe` |
|
||||
| "WSL or native?" | Both should work; not documented; corporate-managed Windows VMs often lack WSL | One README line + a smoke test invocation under each |
|
||||
| "Respects system proxy / corporate proxy?" | Go `http.Client` reads `HTTP_PROXY`/`HTTPS_PROXY` env vars but **does not** read Windows system proxy registry or PAC files. Corporate networks rely on these. | Either document the env-var workaround, or vendor a PAC-aware transport (e.g. `github.com/rapid7/go-get-proxied`); test path covered by Phase 1 smoke matrix |
|
||||
| "Authenticode signed binary?" | Releases are unsigned; SmartScreen will warn, some corp policies block | GoReleaser supports cosign + signtool integration; needs an EV cert (or Azure Trusted Signing) — non-trivial cost. Document the workaround for now: "right-click → Properties → Unblock" |
|
||||
| "MSI installer?" | We ship a zip; some shops can't deploy raw zips through SCCM / Intune | Add an `.msi` artifact to GoReleaser via `go-msi` or `wix`. Mid-effort; gated on whether anyone actually asks for it (post the question to the eventual r/devops thread, see who upvotes) |
|
||||
| "Windows Event Viewer integration?" | Logs go to slog default sink + per-session audit log under project root | Document the audit log location explicitly; add a `--log-format=eventlog` mode later if anyone asks |
|
||||
| "Group Policy hooks?" | None. Config is per-user TOML. | Out of scope short-term. Document `[provider.endpoints]` + `[router].prefer` as the levers admins would use via login script / config push |
|
||||
| "Air-gapped install?" | Static binary works; ollama dependency is the problem (model downloads, runtime updates) | Document the offline flow: pre-download models via `ollama pull` on a connected machine, ship to the air-gapped network. Not a code change, just a doc gap |
|
||||
|
||||
### Phase 3 — macOS concerns
|
||||
|
||||
Smaller surface; mostly Apple-silicon launch sanity (the arm64
|
||||
binary works) + Gatekeeper / notarization warning on first run.
|
||||
Same documentation note as Authenticode applies.
|
||||
|
||||
### Pre-conditions for posting to r/devops
|
||||
|
||||
Per [[next-reddit-post]], the security-observation post should land
|
||||
on r/devops eventually. **Don't post until Phase 1 is in place** so
|
||||
the predictable "did you test it?" question has an honest answer.
|
||||
Phase 2 items don't all need to ship first — but each one needs at
|
||||
least a TODO-linked acknowledgement in the post body so the
|
||||
thread sees gnoma takes the gaps seriously.
|
||||
|
||||
Plan (build-tag scaffolding + concrete code touch-points):
|
||||
[`docs/superpowers/plans/2026-06-04-cross-platform.md`](docs/superpowers/plans/2026-06-04-cross-platform.md).
|
||||
|
||||
- **Tool-router specialization (functiongemma)** — gated on telemetry,
|
||||
not committed. Phase A.2 adds did-switch-rate measurement to the
|
||||
@@ -213,7 +418,8 @@ Active work, newest first.
|
||||
from `dockers` + `docker_manifests` to `dockers_v2` in
|
||||
`.goreleaser.yml` (collapses ~45 lines into one block but
|
||||
requires Dockerfile changes for the per-platform binary layout
|
||||
— deferred to its own commit before v0.3.0).
|
||||
— deferred to its own commit before v0.3.0). Plan:
|
||||
[`docs/superpowers/plans/2026-06-04-distribution-followups.md`](docs/superpowers/plans/2026-06-04-distribution-followups.md).
|
||||
|
||||
## Stable backlog (not in active phases)
|
||||
|
||||
|
||||
@@ -0,0 +1,122 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"os"
|
||||
|
||||
gnomacfg "somegit.dev/Owlibou/gnoma/internal/config"
|
||||
)
|
||||
|
||||
// runConfigCommand handles `gnoma config <subcommand>`. The
|
||||
// subcommand is the only CLI surface for writing to the layered
|
||||
// config (the rest of the binary reads via gnomacfg.Load).
|
||||
//
|
||||
// Subcommands:
|
||||
// - set <key> <value> write a key to the project config (or
|
||||
// global with --global). Whitelisted keys
|
||||
// only — see gnomacfg.AllowedKeys().
|
||||
// - keys list the whitelisted keys and what they do.
|
||||
func runConfigCommand(args []string) int {
|
||||
if len(args) == 0 {
|
||||
printConfigUsage(os.Stderr)
|
||||
return 1
|
||||
}
|
||||
switch args[0] {
|
||||
case "set":
|
||||
return runConfigSet(args[1:])
|
||||
case "keys":
|
||||
return runConfigKeys()
|
||||
case "help", "-h", "--help":
|
||||
printConfigUsage(os.Stdout)
|
||||
return 0
|
||||
default:
|
||||
fmt.Fprintf(os.Stderr, "unknown config command: %s\n", args[0])
|
||||
printConfigUsage(os.Stderr)
|
||||
return 1
|
||||
}
|
||||
}
|
||||
|
||||
func printConfigUsage(w *os.File) {
|
||||
pfln(w, "usage: gnoma config <command>")
|
||||
pfln(w, "commands:")
|
||||
pfln(w, " set <key> <value> write a key to the project config (use --global for the global file)")
|
||||
pfln(w, " keys list the whitelisted keys")
|
||||
}
|
||||
|
||||
// pfln is the *os.File equivalent of pf/pln in profile_cmd.go. The
|
||||
// `*os.File` overload can't be reached from those generic io.Writer
|
||||
// helpers because os.File's error return is `error` not `(int, error)`
|
||||
// like some other writers, and reusing the existing helpers would
|
||||
// need a type assertion. Cheap to define here.
|
||||
func pfln(w *os.File, args ...any) {
|
||||
_, _ = fmt.Fprintln(w, args...)
|
||||
}
|
||||
|
||||
func runConfigSet(args []string) int {
|
||||
global := false
|
||||
keyArgs := args
|
||||
// Manual flag parse to keep the surface tiny — the command
|
||||
// takes at most one flag and two positional args.
|
||||
for i, a := range args {
|
||||
if a == "--global" {
|
||||
global = true
|
||||
keyArgs = append(args[:i], args[i+1:]...)
|
||||
break
|
||||
}
|
||||
}
|
||||
if len(keyArgs) != 2 {
|
||||
fmt.Fprintln(os.Stderr, "usage: gnoma config set [--global] <key> <value>")
|
||||
return 1
|
||||
}
|
||||
key, value := keyArgs[0], keyArgs[1]
|
||||
|
||||
var err error
|
||||
if global {
|
||||
err = gnomacfg.SetGlobalConfig(key, value)
|
||||
} else {
|
||||
err = gnomacfg.SetProjectConfig(key, value)
|
||||
}
|
||||
if err != nil {
|
||||
fmt.Fprintf(os.Stderr, "error: %v\n", err)
|
||||
return 1
|
||||
}
|
||||
|
||||
target := "project"
|
||||
if global {
|
||||
target = "global"
|
||||
}
|
||||
fmt.Printf("set %s = %q (%s config)\n", key, value, target)
|
||||
return 0
|
||||
}
|
||||
|
||||
func runConfigKeys() int {
|
||||
fmt.Println("whitelisted config keys (gnoma config set <key> <value>):")
|
||||
fmt.Println()
|
||||
|
||||
// Brief description for each key. Keep this in sync with
|
||||
// the Config struct field tags and the defaults in
|
||||
// gnomacfg.Defaults().
|
||||
descriptions := map[string]string{
|
||||
"provider.default": "default provider name (e.g. anthropic, openai, ollama)",
|
||||
"provider.model": "default model name (e.g. claude-opus-4-7)",
|
||||
"permission.mode": "permission mode: auto, allow, deny",
|
||||
"slm.model_url": "llamafile-only: URL to download the model binary from",
|
||||
"slm.enabled": "enable the SLM classifier (true/false)",
|
||||
"slm.data_dir": "llamafile-only: where to put the downloaded model",
|
||||
"tui.theme": "TUI theme name (e.g. catppuccin, dracula)",
|
||||
"tui.vim": "enable vim keybindings in the TUI (true/false)",
|
||||
}
|
||||
keys := gnomacfg.AllowedKeys()
|
||||
for _, k := range keys {
|
||||
desc, ok := descriptions[k]
|
||||
if !ok {
|
||||
desc = "(no description)"
|
||||
}
|
||||
fmt.Printf(" %-22s %s\n", k, desc)
|
||||
}
|
||||
fmt.Println()
|
||||
fmt.Println("Tip: by default `set` writes to the project config")
|
||||
fmt.Println("(.gnoma/config.toml). Pass --global to write to the")
|
||||
fmt.Println("global config (~/.config/gnoma/config.toml) instead.")
|
||||
return 0
|
||||
}
|
||||
@@ -0,0 +1,91 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"os"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
"testing"
|
||||
)
|
||||
|
||||
// TestRunConfigSet_WritesAllowedKey exercises the `gnoma config set`
|
||||
// happy path: it writes the key to the project config file and
|
||||
// emits the confirmation line. The atomic write is verified by
|
||||
// `TestSetProjectConfig_AtomicWriteLeavesNoTempFile` in
|
||||
// internal/config; this test just covers the CLI plumbing.
|
||||
func TestRunConfigSet_WritesAllowedKey(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
t.Setenv("XDG_CONFIG_HOME", dir)
|
||||
|
||||
// Run from a fresh project dir so projectConfigPath() picks
|
||||
// up the new location.
|
||||
origDir, _ := os.Getwd()
|
||||
projectDir := filepath.Join(dir, "project")
|
||||
if err := os.MkdirAll(projectDir, 0o755); err != nil {
|
||||
t.Fatalf("mkdir: %v", err)
|
||||
}
|
||||
if err := os.Chdir(projectDir); err != nil {
|
||||
t.Fatalf("chdir: %v", err)
|
||||
}
|
||||
t.Cleanup(func() { _ = os.Chdir(origDir) })
|
||||
|
||||
// Set TUI theme to dracula.
|
||||
if rc := runConfigSet([]string{"tui.theme", "dracula"}); rc != 0 {
|
||||
t.Fatalf("runConfigSet rc=%d", rc)
|
||||
}
|
||||
|
||||
// Project config should now contain the value.
|
||||
data, err := os.ReadFile(filepath.Join(projectDir, ".gnoma", "config.toml"))
|
||||
if err != nil {
|
||||
t.Fatalf("read: %v", err)
|
||||
}
|
||||
if !strings.Contains(string(data), `theme = "dracula"`) {
|
||||
t.Errorf("config missing set value, got:\n%s", data)
|
||||
}
|
||||
}
|
||||
|
||||
// TestRunConfigSet_RejectsUnknownKey verifies the CLI surfaces the
|
||||
// allowlist error rather than silently no-op'ing.
|
||||
func TestRunConfigSet_RejectsUnknownKey(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
origDir, _ := os.Getwd()
|
||||
if err := os.Chdir(dir); err != nil {
|
||||
t.Fatalf("chdir: %v", err)
|
||||
}
|
||||
t.Cleanup(func() { _ = os.Chdir(origDir) })
|
||||
|
||||
// Suppress the "error:" stderr line from the test output.
|
||||
rc := runConfigSet([]string{"not.a.real.key", "x"})
|
||||
if rc == 0 {
|
||||
t.Errorf("expected non-zero rc for unknown key, got 0")
|
||||
}
|
||||
}
|
||||
|
||||
// TestRunConfigKeys_ListsAllAllowedKeys verifies the `keys`
|
||||
// subcommand surfaces every entry from gnomacfg.AllowedKeys().
|
||||
func TestRunConfigKeys_ListsAllAllowedKeys(t *testing.T) {
|
||||
// Redirect stdout to a buffer; the function prints directly
|
||||
// to os.Stdout.
|
||||
origStdout := os.Stdout
|
||||
r, w, _ := os.Pipe()
|
||||
os.Stdout = w
|
||||
t.Cleanup(func() { os.Stdout = origStdout })
|
||||
|
||||
rc := runConfigKeys()
|
||||
_ = w.Close()
|
||||
if rc != 0 {
|
||||
t.Fatalf("runConfigKeys rc=%d", rc)
|
||||
}
|
||||
|
||||
buf := make([]byte, 4096)
|
||||
n, _ := r.Read(buf)
|
||||
out := string(buf[:n])
|
||||
for _, k := range []string{
|
||||
"provider.default", "provider.model", "permission.mode",
|
||||
"slm.model_url", "slm.enabled", "slm.data_dir",
|
||||
"tui.theme", "tui.vim",
|
||||
} {
|
||||
if !strings.Contains(out, k) {
|
||||
t.Errorf("keys output missing %q, got:\n%s", k, out)
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,159 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"os"
|
||||
"sort"
|
||||
|
||||
gnomacfg "somegit.dev/Owlibou/gnoma/internal/config"
|
||||
)
|
||||
|
||||
// runDoctorCommand handles `gnoma doctor`. Read-only diagnostic
|
||||
// over config files. Default: scans the project config (and
|
||||
// the global config if the project one is missing). With
|
||||
// `--all-projects`, walks the registry. With `--json`,
|
||||
// emits structured findings to stdout for CI consumption.
|
||||
// Exits non-zero on Warn+ findings (CI-friendly).
|
||||
func runDoctorCommand(args []string) int {
|
||||
jsonOutput := false
|
||||
allProjects := false
|
||||
pathArgs := args
|
||||
for i, a := range args {
|
||||
switch a {
|
||||
case "--json":
|
||||
jsonOutput = true
|
||||
pathArgs = append(args[:i], args[i+1:]...)
|
||||
case "--all-projects":
|
||||
allProjects = true
|
||||
pathArgs = append(args[:i], args[i+1:]...)
|
||||
}
|
||||
}
|
||||
|
||||
var paths []string
|
||||
switch {
|
||||
case allProjects:
|
||||
loaded, err := gnomacfg.LoadRegistry()
|
||||
if err != nil {
|
||||
fmt.Fprintf(os.Stderr, "error: load registry: %v\n", err)
|
||||
return 1
|
||||
}
|
||||
// Always include the global config in --all-projects
|
||||
// mode (it applies to every project). Then per-project
|
||||
// configs from the registry. Files that don't exist
|
||||
// are filtered out — the doctor reports a finding for
|
||||
// them, but in --all-projects mode we silently skip
|
||||
// rather than reporting every project root that has
|
||||
// been visited but has no config.
|
||||
paths = append(paths, gnomacfg.GlobalConfigPath())
|
||||
for _, p := range loaded.Projects {
|
||||
paths = append(paths, gnomacfg.ProjectConfigPathFor(p.Path))
|
||||
}
|
||||
// Dedupe and sort for deterministic output.
|
||||
seen := map[string]bool{}
|
||||
var deduped []string
|
||||
for _, p := range paths {
|
||||
if seen[p] {
|
||||
continue
|
||||
}
|
||||
seen[p] = true
|
||||
deduped = append(deduped, p)
|
||||
}
|
||||
sort.Strings(deduped)
|
||||
paths = deduped
|
||||
case len(pathArgs) == 0:
|
||||
paths = []string{gnomacfg.ProjectConfigPath()}
|
||||
case len(pathArgs) == 1:
|
||||
paths = []string{pathArgs[0]}
|
||||
default:
|
||||
fmt.Fprintln(os.Stderr, "usage: gnoma doctor [--all-projects] [--json] [path]")
|
||||
return 1
|
||||
}
|
||||
|
||||
doc := gnomacfg.NewDoctor()
|
||||
findings := doc.DiagnoseFiles(paths)
|
||||
|
||||
// Cross-file layering checks in --all-projects mode. For
|
||||
// each registered project, compare the global config
|
||||
// against the project's and surface shadowing cases —
|
||||
// the original 2026-05-24 silent-corruption bug.
|
||||
if allProjects {
|
||||
loaded, err := gnomacfg.LoadRegistry()
|
||||
if err == nil {
|
||||
for _, p := range loaded.Projects {
|
||||
projectPath := gnomacfg.ProjectConfigPathFor(p.Path)
|
||||
if _, statErr := os.Stat(projectPath); statErr != nil {
|
||||
continue
|
||||
}
|
||||
findings = append(findings, doc.DiagnoseLayering(gnomacfg.GlobalConfigPath(), projectPath)...)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return renderAndExit(findings, jsonOutput)
|
||||
}
|
||||
|
||||
// renderAndExit emits findings to stdout (text or JSON per
|
||||
// the --json flag) and returns the exit code:
|
||||
//
|
||||
// 0 — clean (no findings, or only Info findings)
|
||||
// 1 — Warn or Error findings present
|
||||
//
|
||||
// Error findings indicate file-level failures (missing or
|
||||
// corrupt files); for those the message is the only signal.
|
||||
// Warn findings are the actionable ones — the user should
|
||||
// review and fix.
|
||||
func renderAndExit(findings []gnomacfg.Finding, jsonOutput bool) int {
|
||||
if jsonOutput {
|
||||
enc := json.NewEncoder(os.Stdout)
|
||||
enc.SetIndent("", " ")
|
||||
if err := enc.Encode(findings); err != nil {
|
||||
fmt.Fprintf(os.Stderr, "error: encode json: %v\n", err)
|
||||
return 1
|
||||
}
|
||||
} else {
|
||||
renderText(os.Stdout, findings)
|
||||
}
|
||||
|
||||
for _, f := range findings {
|
||||
if f.Severity >= gnomacfg.SeverityWarn {
|
||||
return 1
|
||||
}
|
||||
}
|
||||
return 0
|
||||
}
|
||||
|
||||
// renderText writes findings in a human-readable columnar
|
||||
// format. Severity column, then path:key, then message.
|
||||
// Color is intentionally omitted — this is for terminals and
|
||||
// CI logs alike.
|
||||
func renderText(w *os.File, findings []gnomacfg.Finding) {
|
||||
if len(findings) == 0 {
|
||||
_, _ = fmt.Fprintln(w, "no findings — config looks clean")
|
||||
return
|
||||
}
|
||||
// Find the longest path:key for column alignment.
|
||||
maxWidth := 0
|
||||
for _, f := range findings {
|
||||
loc := f.Path
|
||||
if f.Key != "" {
|
||||
loc = f.Path + ":" + f.Key
|
||||
}
|
||||
if len(loc) > maxWidth {
|
||||
maxWidth = len(loc)
|
||||
}
|
||||
}
|
||||
for _, f := range findings {
|
||||
loc := f.Path
|
||||
if f.Key != "" {
|
||||
loc = f.Path + ":" + f.Key
|
||||
}
|
||||
_, _ = fmt.Fprintf(w, "%-7s %-*s %s\n", f.Severity, maxWidth, loc, f.Message)
|
||||
if f.Suggestion != "" {
|
||||
_, _ = fmt.Fprintf(w, "%-7s %-*s → %s\n", "", maxWidth, "", f.Suggestion)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Ensure the file ends cleanly.
|
||||
var _ = renderAndExit
|
||||
@@ -0,0 +1,213 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
"testing"
|
||||
|
||||
gnomacfg "somegit.dev/Owlibou/gnoma/internal/config"
|
||||
)
|
||||
|
||||
// TestRunDoctorCommand_CleanFileExitsZero verifies the
|
||||
// happy path: a valid config produces no findings and the
|
||||
// command exits 0.
|
||||
func TestRunDoctorCommand_CleanFileExitsZero(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
t.Setenv("XDG_CONFIG_HOME", dir)
|
||||
|
||||
origDir, _ := os.Getwd()
|
||||
projectDir := filepath.Join(dir, "project")
|
||||
if err := os.MkdirAll(projectDir, 0o755); err != nil {
|
||||
t.Fatalf("mkdir: %v", err)
|
||||
}
|
||||
if err := os.Chdir(projectDir); err != nil {
|
||||
t.Fatalf("chdir: %v", err)
|
||||
}
|
||||
t.Cleanup(func() { _ = os.Chdir(origDir) })
|
||||
|
||||
// Create a project config with a valid user value.
|
||||
if err := os.MkdirAll(filepath.Join(projectDir, ".gnoma"), 0o755); err != nil {
|
||||
t.Fatalf("mkdir: %v", err)
|
||||
}
|
||||
if err := os.WriteFile(
|
||||
filepath.Join(projectDir, ".gnoma", "config.toml"),
|
||||
[]byte("[provider]\ndefault = \"anthropic\"\n"),
|
||||
0o644,
|
||||
); err != nil {
|
||||
t.Fatalf("seed: %v", err)
|
||||
}
|
||||
|
||||
if rc := runDoctorCommand(nil); rc != 0 {
|
||||
t.Errorf("rc = %d, want 0 for clean file", rc)
|
||||
}
|
||||
}
|
||||
|
||||
// TestRunDoctorCommand_WarnFindingExitsOne verifies the
|
||||
// CI-friendly exit code: a Warn finding (invalid enum
|
||||
// value) causes a non-zero exit.
|
||||
func TestRunDoctorCommand_WarnFindingExitsOne(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "config.toml")
|
||||
if err := os.WriteFile(path, []byte("[permission]\nmode = \"yes\"\n"), 0o644); err != nil {
|
||||
t.Fatalf("seed: %v", err)
|
||||
}
|
||||
|
||||
if rc := runDoctorCommand([]string{path}); rc != 1 {
|
||||
t.Errorf("rc = %d, want 1 for warn finding", rc)
|
||||
}
|
||||
}
|
||||
|
||||
// TestRunDoctorCommand_JSONOutputIsValidJSON verifies the
|
||||
// --json flag emits parseable JSON to stdout, suitable for
|
||||
// CI/script consumption.
|
||||
func TestRunDoctorCommand_JSONOutputIsValidJSON(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "config.toml")
|
||||
if err := os.WriteFile(path, []byte("[permission]\nmode = \"yes\"\n"), 0o644); err != nil {
|
||||
t.Fatalf("seed: %v", err)
|
||||
}
|
||||
|
||||
// Capture stdout.
|
||||
origStdout := os.Stdout
|
||||
r, w, _ := os.Pipe()
|
||||
os.Stdout = w
|
||||
t.Cleanup(func() { os.Stdout = origStdout })
|
||||
|
||||
rc := runDoctorCommand([]string{path, "--json"})
|
||||
_ = w.Close()
|
||||
if rc != 1 {
|
||||
t.Errorf("rc = %d, want 1", rc)
|
||||
}
|
||||
|
||||
buf := make([]byte, 8192)
|
||||
n, _ := r.Read(buf)
|
||||
out := string(buf[:n])
|
||||
|
||||
// Should be valid JSON array of Finding objects.
|
||||
var findings []map[string]any
|
||||
if err := json.Unmarshal([]byte(out), &findings); err != nil {
|
||||
t.Fatalf("json.Unmarshal: %v\noutput:\n%s", err, out)
|
||||
}
|
||||
if len(findings) == 0 {
|
||||
t.Errorf("json output had zero findings; expected at least one")
|
||||
}
|
||||
if findings[0]["severity"] != "warn" {
|
||||
t.Errorf("severity = %v, want warn", findings[0]["severity"])
|
||||
}
|
||||
}
|
||||
|
||||
// TestRunDoctorCommand_TextOutputIncludesFindingKey verifies
|
||||
// the human-readable output format. Should include the file
|
||||
// path and the finding key.
|
||||
func TestRunDoctorCommand_TextOutputIncludesFindingKey(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "config.toml")
|
||||
if err := os.WriteFile(path, []byte("[permission]\nmode = \"yes\"\n"), 0o644); err != nil {
|
||||
t.Fatalf("seed: %v", err)
|
||||
}
|
||||
|
||||
origStdout := os.Stdout
|
||||
r, w, _ := os.Pipe()
|
||||
os.Stdout = w
|
||||
t.Cleanup(func() { os.Stdout = origStdout })
|
||||
|
||||
rc := runDoctorCommand([]string{path})
|
||||
_ = w.Close()
|
||||
if rc != 1 {
|
||||
t.Errorf("rc = %d, want 1", rc)
|
||||
}
|
||||
|
||||
buf := make([]byte, 4096)
|
||||
n, _ := r.Read(buf)
|
||||
out := string(buf[:n])
|
||||
|
||||
if !strings.Contains(out, "permission.mode") {
|
||||
t.Errorf("output missing key, got:\n%s", out)
|
||||
}
|
||||
if !strings.Contains(out, path) {
|
||||
t.Errorf("output missing path, got:\n%s", out)
|
||||
}
|
||||
if !strings.Contains(out, "warn") {
|
||||
t.Errorf("output missing severity, got:\n%s", out)
|
||||
}
|
||||
}
|
||||
|
||||
// TestRunDoctorCommand_MissingFileExitsOne documents the
|
||||
// error path: a missing config file produces a single
|
||||
// SeverityError finding and the command exits 1.
|
||||
func TestRunDoctorCommand_MissingFileExitsOne(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "nonexistent.toml")
|
||||
|
||||
if rc := runDoctorCommand([]string{path}); rc != 1 {
|
||||
t.Errorf("rc = %d, want 1 for missing file", rc)
|
||||
}
|
||||
}
|
||||
|
||||
// TestRunDoctorCommand_AllProjectsLayeringFires verifies the
|
||||
// 2026-06-04 follow-up: `gnoma doctor --all-projects` runs
|
||||
// cross-file layering checks between the global config and
|
||||
// every registered project's config, catching the original
|
||||
// silent-corruption bug.
|
||||
func TestRunDoctorCommand_AllProjectsLayeringFires(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
t.Setenv("XDG_CONFIG_HOME", dir)
|
||||
|
||||
// Global has router.prefer = "cloud".
|
||||
globalDir := filepath.Join(dir, "gnoma")
|
||||
if err := os.MkdirAll(globalDir, 0o755); err != nil {
|
||||
t.Fatalf("mkdir: %v", err)
|
||||
}
|
||||
if err := os.WriteFile(
|
||||
filepath.Join(globalDir, "config.toml"),
|
||||
[]byte("[router]\nprefer = \"cloud\"\n"),
|
||||
0o644,
|
||||
); err != nil {
|
||||
t.Fatalf("seed global: %v", err)
|
||||
}
|
||||
|
||||
// Project has router.prefer = "" — the original symptom.
|
||||
projectDir := filepath.Join(dir, "shadowed-project")
|
||||
projectGnomaDir := filepath.Join(projectDir, ".gnoma")
|
||||
if err := os.MkdirAll(projectGnomaDir, 0o755); err != nil {
|
||||
t.Fatalf("mkdir: %v", err)
|
||||
}
|
||||
if err := os.WriteFile(
|
||||
filepath.Join(projectGnomaDir, "config.toml"),
|
||||
[]byte("[router]\nprefer = \"\"\n"),
|
||||
0o644,
|
||||
); err != nil {
|
||||
t.Fatalf("seed project: %v", err)
|
||||
}
|
||||
|
||||
// Register the project.
|
||||
reg, _ := gnomacfg.LoadRegistry()
|
||||
if err := reg.Record(projectDir); err != nil {
|
||||
t.Fatalf("Record: %v", err)
|
||||
}
|
||||
|
||||
// Capture stdout.
|
||||
origStdout := os.Stdout
|
||||
r, w, _ := os.Pipe()
|
||||
os.Stdout = w
|
||||
t.Cleanup(func() { os.Stdout = origStdout })
|
||||
|
||||
rc := runDoctorCommand([]string{"--all-projects"})
|
||||
_ = w.Close()
|
||||
if rc != 1 {
|
||||
t.Errorf("rc = %d, want 1 (shadowing finding should trigger non-zero exit)", rc)
|
||||
}
|
||||
|
||||
buf := make([]byte, 8192)
|
||||
n, _ := r.Read(buf)
|
||||
out := string(buf[:n])
|
||||
|
||||
if !strings.Contains(out, "router.prefer") {
|
||||
t.Errorf("output missing shadowing key, got:\n%s", out)
|
||||
}
|
||||
if !strings.Contains(out, "shadow") {
|
||||
t.Errorf("output missing shadowing message, got:\n%s", out)
|
||||
}
|
||||
}
|
||||
+65
-20
@@ -87,6 +87,9 @@ func main() {
|
||||
fmt.Fprintf(os.Stderr, " gnoma slm setup download and verify the llamafile model\n")
|
||||
fmt.Fprintf(os.Stderr, " gnoma slm status show SLM setup state\n")
|
||||
fmt.Fprintf(os.Stderr, " gnoma router stats show router quality + classifier telemetry\n")
|
||||
fmt.Fprintf(os.Stderr, " gnoma config write a config key or list whitelisted keys\n")
|
||||
fmt.Fprintf(os.Stderr, " gnoma upgrade-config clean a config file in place (--dry-run previews; --all walks the registry)\n")
|
||||
fmt.Fprintf(os.Stderr, " gnoma doctor diagnostic scan; --all-projects walks the registry\n")
|
||||
fmt.Fprintf(os.Stderr, "\nFlags:\n")
|
||||
flag.PrintDefaults()
|
||||
}
|
||||
@@ -180,9 +183,15 @@ func main() {
|
||||
case "slm":
|
||||
os.Exit(runSLMCommand(cliArgs[1:], cfg, logger))
|
||||
case "router":
|
||||
os.Exit(runRouterCommand(cliArgs[1:], profile))
|
||||
os.Exit(runRouterCommand(cliArgs[1:], cfg, profile))
|
||||
case "profile":
|
||||
os.Exit(runProfileCommand(cliArgs[1:], cfg, profile))
|
||||
case "config":
|
||||
os.Exit(runConfigCommand(cliArgs[1:]))
|
||||
case "upgrade-config":
|
||||
os.Exit(runUpgradeConfigCommand(cliArgs[1:]))
|
||||
case "doctor":
|
||||
os.Exit(runDoctorCommand(cliArgs[1:]))
|
||||
}
|
||||
}
|
||||
|
||||
@@ -230,6 +239,31 @@ func main() {
|
||||
}, safety.ScanCWDForSensitive(cwdAbs))
|
||||
fmt.Fprint(os.Stderr, banner)
|
||||
|
||||
// Resolve the config once, here, so the rest of the startup
|
||||
// path (registry, firewall, tool registry, etc.) all share
|
||||
// one Resolved view. Pointer-converted fields with defaults
|
||||
// substituted are read via resolved.*; raw cfg.* is
|
||||
// internal after this point.
|
||||
resolved := cfg.Resolved()
|
||||
|
||||
// Record the project in the user-level registry (Phase 2 of
|
||||
// the 2026-05-24 config-migration plan). Failure is
|
||||
// non-fatal — the registry is a convenience for
|
||||
// `gnoma doctor --all-projects` and
|
||||
// `gnoma upgrade-config --all`, never a hard dependency
|
||||
// on startup. Resolved().ProjectRegistry defaults to true;
|
||||
// the user can opt out via [config].project_registry = false
|
||||
// in their config file.
|
||||
if resolved.ProjectRegistry {
|
||||
if reg, err := gnomacfg.LoadRegistry(); err != nil {
|
||||
logger.Warn("project registry load failed (continuing)",
|
||||
"path", gnomacfg.RegistryFilePath(), "error", err)
|
||||
} else if err := reg.Record(gnomacfg.ProjectRoot()); err != nil {
|
||||
logger.Warn("project registry record failed (continuing)",
|
||||
"project", gnomacfg.ProjectRoot(), "error", err)
|
||||
}
|
||||
}
|
||||
|
||||
knownProviders := map[string]bool{
|
||||
"mistral": true, "anthropic": true, "openai": true,
|
||||
"google": true, "ollama": true, "llamacpp": true,
|
||||
@@ -319,8 +353,8 @@ func main() {
|
||||
|
||||
// Create tool registry
|
||||
reg := buildToolRegistry(fsGuard)
|
||||
if cfg.Tools.MaxFileSize > 0 {
|
||||
w := fs.NewWriteTool(fs.WithMaxFileSize(cfg.Tools.MaxFileSize))
|
||||
if resolved.Tools.MaxFileSize > 0 {
|
||||
w := fs.NewWriteTool(fs.WithMaxFileSize(resolved.Tools.MaxFileSize))
|
||||
w.SetGuard(fsGuard)
|
||||
reg.Register(w)
|
||||
}
|
||||
@@ -387,7 +421,7 @@ func main() {
|
||||
|
||||
// Create session store. Per-profile session dir keeps work/private
|
||||
// sessions from cross-contaminating the resume list.
|
||||
sessStore := session.NewSessionStoreAt(profile.SessionDir(gnomacfg.ProjectRoot()), cfg.Session.MaxKeep, logger)
|
||||
sessStore := session.NewSessionStoreAt(profile.SessionDir(gnomacfg.ProjectRoot()), resolved.Session.MaxKeep, logger)
|
||||
|
||||
// FirewallRef holds the *Firewall via atomic.Pointer so it can be
|
||||
// installed into SafeProvider wrappers before NewFirewall runs below
|
||||
@@ -591,10 +625,7 @@ func main() {
|
||||
)
|
||||
|
||||
// Create firewall
|
||||
entropyThreshold := 4.5
|
||||
if cfg.Security.EntropyThreshold > 0 {
|
||||
entropyThreshold = cfg.Security.EntropyThreshold
|
||||
}
|
||||
entropyThreshold := resolved.Security.EntropyThreshold
|
||||
fw := security.NewFirewall(security.FirewallConfig{
|
||||
ScanOutgoing: true,
|
||||
ScanToolResults: true,
|
||||
@@ -821,7 +852,7 @@ func main() {
|
||||
}
|
||||
|
||||
// Derive context window size from registered arm capabilities (accurate) or fall back to heuristic
|
||||
contextWindowSize := int64(cfg.Provider.MaxTokens) * 20
|
||||
contextWindowSize := resolved.Provider.MaxTokens * 20
|
||||
if arm, ok := rtr.LookupArm(armID); ok && arm.Capabilities.ContextWindow > 0 {
|
||||
contextWindowSize = int64(arm.Capabilities.ContextWindow)
|
||||
logger.Debug("context window from arm capabilities", "arm", armID, "context_window", contextWindowSize)
|
||||
@@ -867,7 +898,7 @@ func main() {
|
||||
BaseURL: cfg.SLM.BaseURL,
|
||||
ModelURL: cfg.SLM.ModelURL,
|
||||
DataDir: cfg.SLM.DataDir,
|
||||
StartupTimeout: cfg.SLM.StartupTimeout.Duration(),
|
||||
StartupTimeout: resolved.SLM.StartupTimeout,
|
||||
}
|
||||
fmt.Fprintln(os.Stderr, "Starting SLM...")
|
||||
boot, bootErr := slm.StartBackend(context.Background(), bcfg, logger)
|
||||
@@ -881,21 +912,35 @@ func main() {
|
||||
// transport and as a router arm. Both paths route through the
|
||||
// firewall after fwRef.Set fires above.
|
||||
slmProvider := security.WrapProvider(boot.Provider, fwRef)
|
||||
lazy.set(slm.NewClassifier(slmProvider, boot.Model, logger))
|
||||
lazy.set(slm.NewClassifier(slmProvider, boot.Model, resolved.SLM.ClassifyTimeout, logger))
|
||||
// ToolUse comes from the live probe of the actual model. For
|
||||
// completion-only models (e.g. TinyLlama), the SLM arm only
|
||||
// handles knowledge-only prompts where the trivial-prompt
|
||||
// heuristic flipped RequiresTools=false. For tool-capable
|
||||
// models, the SLM also covers simple file reads etc., gated
|
||||
// by MaxComplexity=0.3.
|
||||
rtr.RegisterArm(&router.Arm{
|
||||
ID: router.ArmID("slm/" + string(boot.Backend)),
|
||||
Provider: slmProvider,
|
||||
ModelName: boot.Model,
|
||||
IsLocal: true,
|
||||
MaxComplexity: 0.3,
|
||||
Capabilities: provider.Capabilities{ToolUse: boot.ToolSupport},
|
||||
})
|
||||
//
|
||||
// [slm].register_as_arm gates the dual-role registration.
|
||||
// Default (nil) is true to preserve pre-config behaviour.
|
||||
// Explicit false makes the SLM classifier-only, which is
|
||||
// the correct setting for task-specialised models
|
||||
// (FunctionGemma, code-completion-tuned models, etc.) that
|
||||
// would mishandle a general prompt routed to them as the
|
||||
// answer-producing arm. Resolved() applies the default-true
|
||||
// substitution; see ResolvedSLMSection in resolve.go.
|
||||
if resolved.SLM.RegisterAsArm {
|
||||
rtr.RegisterArm(&router.Arm{
|
||||
ID: router.ArmID("slm/" + string(boot.Backend)),
|
||||
Provider: slmProvider,
|
||||
ModelName: boot.Model,
|
||||
IsLocal: true,
|
||||
MaxComplexity: 0.3,
|
||||
Capabilities: provider.Capabilities{ToolUse: boot.ToolSupport},
|
||||
})
|
||||
} else {
|
||||
logger.Info("SLM registered as classifier only ([slm].register_as_arm=false)",
|
||||
"model", boot.Model)
|
||||
}
|
||||
slmCleanup = boot.Close
|
||||
slmInfo.Active = true
|
||||
slmInfo.Backend = string(boot.Backend)
|
||||
@@ -938,7 +983,7 @@ func main() {
|
||||
Store: store,
|
||||
Hooks: dispatcher,
|
||||
Logger: logger,
|
||||
ForceTwoStageTools: cfg.Router.ForceTwoStage,
|
||||
ForceTwoStageTools: resolved.Router.ForceTwoStage,
|
||||
})
|
||||
if err != nil {
|
||||
fmt.Fprintf(os.Stderr, "error: %v\n", err)
|
||||
|
||||
+12
-11
@@ -158,6 +158,7 @@ func runProfileShow(name string) int {
|
||||
// API key *values* are never printed — only the set of configured
|
||||
// providers. Extracted for testing.
|
||||
func formatProfileShow(w io.Writer, cfg *gnomacfg.Config, profile gnomacfg.Profile, profilePath, baseConfigPath, globalDir, projectRoot string) {
|
||||
resolved := cfg.Resolved()
|
||||
if profile.Active {
|
||||
pf(w, "Profile: %s\n", profile.Name)
|
||||
} else {
|
||||
@@ -176,8 +177,8 @@ func formatProfileShow(w io.Writer, cfg *gnomacfg.Config, profile gnomacfg.Profi
|
||||
if cfg.Provider.Model != "" {
|
||||
pf(w, " model = %s\n", cfg.Provider.Model)
|
||||
}
|
||||
if cfg.Provider.MaxTokens > 0 {
|
||||
pf(w, " max_tokens = %d\n", cfg.Provider.MaxTokens)
|
||||
if resolved.Provider.MaxTokens > 0 {
|
||||
pf(w, " max_tokens = %d\n", resolved.Provider.MaxTokens)
|
||||
}
|
||||
if len(cfg.Provider.APIKeys) > 0 {
|
||||
pf(w, " api_keys = %s\n", sortedKeys(cfg.Provider.APIKeys))
|
||||
@@ -227,24 +228,24 @@ func formatProfileShow(w io.Writer, cfg *gnomacfg.Config, profile gnomacfg.Profi
|
||||
}
|
||||
}
|
||||
|
||||
if cfg.Router.ForceTwoStage {
|
||||
if resolved.Router.ForceTwoStage {
|
||||
pln(w, "\n[router]")
|
||||
pf(w, " force_two_stage = %v\n", cfg.Router.ForceTwoStage)
|
||||
pf(w, " force_two_stage = %v\n", resolved.Router.ForceTwoStage)
|
||||
}
|
||||
|
||||
if cfg.Tools.BashTimeout.Duration() > 0 || cfg.Tools.MaxFileSize > 0 {
|
||||
if resolved.Tools.BashTimeout > 0 || resolved.Tools.MaxFileSize > 0 {
|
||||
pln(w, "\n[tools]")
|
||||
if cfg.Tools.BashTimeout.Duration() > 0 {
|
||||
pf(w, " bash_timeout = %s\n", cfg.Tools.BashTimeout.Duration())
|
||||
if resolved.Tools.BashTimeout > 0 {
|
||||
pf(w, " bash_timeout = %s\n", resolved.Tools.BashTimeout)
|
||||
}
|
||||
if cfg.Tools.MaxFileSize > 0 {
|
||||
pf(w, " max_file_size = %d\n", cfg.Tools.MaxFileSize)
|
||||
if resolved.Tools.MaxFileSize > 0 {
|
||||
pf(w, " max_file_size = %d\n", resolved.Tools.MaxFileSize)
|
||||
}
|
||||
}
|
||||
|
||||
if cfg.Session.MaxKeep > 0 {
|
||||
if resolved.Session.MaxKeep > 0 {
|
||||
pln(w, "\n[session]")
|
||||
pf(w, " max_keep = %d\n", cfg.Session.MaxKeep)
|
||||
pf(w, " max_keep = %d\n", resolved.Session.MaxKeep)
|
||||
}
|
||||
|
||||
pln(w)
|
||||
|
||||
@@ -185,7 +185,7 @@ func TestFormatProfileShow_PopulatedConfig(t *testing.T) {
|
||||
{Name: "fs", Command: "mcp-fs"},
|
||||
}
|
||||
cfg.Plugins.Enabled = []string{"git-tools"}
|
||||
cfg.Router.ForceTwoStage = true
|
||||
cfg.Router.ForceTwoStage = func() *bool { v := true; return &v }()
|
||||
|
||||
prof := gnomacfg.Profile{Active: true, Name: "work"}
|
||||
|
||||
|
||||
+31
-8
@@ -12,7 +12,7 @@ import (
|
||||
)
|
||||
|
||||
// runRouterCommand handles `gnoma router <subcommand>`. Returns an exit code.
|
||||
func runRouterCommand(args []string, profile gnomacfg.Profile) int {
|
||||
func runRouterCommand(args []string, cfg *gnomacfg.Config, profile gnomacfg.Profile) int {
|
||||
if len(args) == 0 {
|
||||
fmt.Fprintln(os.Stderr, "usage: gnoma router <command>")
|
||||
fmt.Fprintln(os.Stderr, "commands:")
|
||||
@@ -21,14 +21,14 @@ func runRouterCommand(args []string, profile gnomacfg.Profile) int {
|
||||
}
|
||||
switch args[0] {
|
||||
case "stats":
|
||||
return runRouterStats(profile)
|
||||
return runRouterStats(cfg, profile)
|
||||
default:
|
||||
fmt.Fprintf(os.Stderr, "unknown router command: %s\n", args[0])
|
||||
return 1
|
||||
}
|
||||
}
|
||||
|
||||
func runRouterStats(profile gnomacfg.Profile) int {
|
||||
func runRouterStats(cfg *gnomacfg.Config, profile gnomacfg.Profile) int {
|
||||
path := profile.QualityFile(gnomacfg.GlobalConfigDir())
|
||||
data, err := os.ReadFile(path)
|
||||
if err != nil {
|
||||
@@ -52,7 +52,7 @@ func runRouterStats(profile gnomacfg.Profile) int {
|
||||
}
|
||||
printArmTable(snap)
|
||||
fmt.Println()
|
||||
printClassifierTable(snap)
|
||||
printClassifierTable(snap, cfg)
|
||||
return 0
|
||||
}
|
||||
|
||||
@@ -86,7 +86,7 @@ func printArmTable(snap router.QualitySnapshot) {
|
||||
_ = tw.Flush()
|
||||
}
|
||||
|
||||
func printClassifierTable(snap router.QualitySnapshot) {
|
||||
func printClassifierTable(snap router.QualitySnapshot, cfg *gnomacfg.Config) {
|
||||
fmt.Println("Classifier source breakdown:")
|
||||
counts := snap.ClassifierCounts
|
||||
if len(counts) == 0 {
|
||||
@@ -125,16 +125,39 @@ func printClassifierTable(snap router.QualitySnapshot) {
|
||||
_ = tw.Flush()
|
||||
fmt.Printf(" total observations: %d\n", total)
|
||||
|
||||
// Phase-4 trust hint.
|
||||
// Effective heuristic share: both pure heuristic and slm_fallback
|
||||
// observations were routed via the HeuristicClassifier — the only
|
||||
// difference is whether the SLM was attempted first. Surfacing the
|
||||
// combined share answers "how often did the SLM actually drive
|
||||
// routing?" honestly.
|
||||
effectiveHeuristic := counts["heuristic"] + counts["slm_fallback"]
|
||||
if total > 0 {
|
||||
fmt.Printf(" effective heuristic share: %.1f%% (%d fallbacks + %d pure heuristic)\n",
|
||||
float64(effectiveHeuristic)/float64(total)*100,
|
||||
counts["slm_fallback"], counts["heuristic"])
|
||||
}
|
||||
|
||||
// Phase-4 trust hint. Distinguishes the three diagnostic cases —
|
||||
// SLM never called, SLM called but every call failed, SLM working
|
||||
// but minority share — and templates the actionable advice off
|
||||
// the configured backend so the hint doesn't mention llamafile
|
||||
// when the user is on ollama (or vice versa).
|
||||
slmShare := 0.0
|
||||
if total > 0 {
|
||||
slmShare = float64(counts["slm"]) / float64(total) * 100
|
||||
}
|
||||
backend := "the SLM"
|
||||
if cfg != nil && cfg.SLM.Backend != "" {
|
||||
backend = cfg.SLM.Backend
|
||||
}
|
||||
switch {
|
||||
case total < 50:
|
||||
fmt.Println(" hint: < 50 observations — too sparse for Phase 4 trust signal yet.")
|
||||
case counts["slm"] == 0:
|
||||
fmt.Println(" hint: SLM has never classified — check that llamafile boots before short-lived runs end.")
|
||||
case counts["slm"] == 0 && counts["slm_fallback"] == 0:
|
||||
fmt.Printf(" hint: SLM never called — check [slm].enabled and that %s is reachable.\n", backend)
|
||||
case counts["slm"] == 0 && counts["slm_fallback"] > 0:
|
||||
fmt.Printf(" hint: SLM was called %d times but every call fell back — run with `--verbose` to see the underlying error (likely a timeout or parse failure for %s).\n",
|
||||
counts["slm_fallback"], backend)
|
||||
case slmShare < 50:
|
||||
fmt.Printf(" hint: SLM share is %.0f%% — fallback is doing most of the work.\n", slmShare)
|
||||
}
|
||||
|
||||
@@ -0,0 +1,216 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"os"
|
||||
"sort"
|
||||
|
||||
gnomacfg "somegit.dev/Owlibou/gnoma/internal/config"
|
||||
)
|
||||
|
||||
// runUpgradeConfigCommand handles `gnoma upgrade-config`. Cleans
|
||||
// a single config file in place: drops fields whose value matches
|
||||
// the resolved default, leaves explicit-zero pointer fields alone,
|
||||
// writes the cleaned form atomically with a `.bak-YYYYMMDD-HHMMSS`
|
||||
// backup of the original.
|
||||
//
|
||||
// Modes:
|
||||
// - `gnoma upgrade-config` (no args) → project config
|
||||
// - `gnoma upgrade-config --global` → global config
|
||||
// - `gnoma upgrade-config <path>` → the given path
|
||||
// - `gnoma upgrade-config --all` → walk the registry,
|
||||
// upgrade global + every
|
||||
// known project's config
|
||||
// - `gnoma upgrade-config --global <path>` → error (mutually exclusive)
|
||||
// - `gnoma upgrade-config --all <path>` → error (mutually exclusive)
|
||||
//
|
||||
// If the default target (project or global config) doesn't exist,
|
||||
// print a friendly "nothing to upgrade" message and exit 0 — not
|
||||
// a hard error. The user can pass an explicit path to upgrade a
|
||||
// different file. `--all` reports per-file results, exits 1 if
|
||||
// any file failed (or had dry-run changes when in dry-run mode
|
||||
// with --strict, but the basic impl is "any non-zero exit from
|
||||
// per-file handler propagates").
|
||||
func runUpgradeConfigCommand(args []string) int {
|
||||
// Walk args in a single pass, building pathArgs into a fresh
|
||||
// slice. Using args[:i] / args[i+1:] in-place would alias the
|
||||
// underlying array and corrupt subsequent iterations' `a`
|
||||
// reads (a known Go slice footgun). The fresh-slice approach
|
||||
// keeps the parsing correct regardless of flag ordering.
|
||||
var pathArgs []string
|
||||
dryRun := false
|
||||
global := false
|
||||
all := false
|
||||
for _, a := range args {
|
||||
switch a {
|
||||
case "--dry-run":
|
||||
dryRun = true
|
||||
case "--global":
|
||||
global = true
|
||||
case "--all":
|
||||
all = true
|
||||
default:
|
||||
pathArgs = append(pathArgs, a)
|
||||
}
|
||||
}
|
||||
|
||||
// --global / --all and an explicit path are mutually exclusive.
|
||||
if (global || all) && len(pathArgs) > 0 {
|
||||
fmt.Fprintln(os.Stderr, "usage: gnoma upgrade-config [--dry-run] [--global | --all | <path>]")
|
||||
return 1
|
||||
}
|
||||
if global && all {
|
||||
fmt.Fprintln(os.Stderr, "usage: gnoma upgrade-config [--dry-run] [--global | --all | <path>]")
|
||||
return 1
|
||||
}
|
||||
|
||||
// --all mode: walk the registry.
|
||||
if all {
|
||||
return runUpgradeConfigAll(dryRun)
|
||||
}
|
||||
|
||||
target := ""
|
||||
switch {
|
||||
case global:
|
||||
target = gnomacfg.GlobalConfigPath()
|
||||
case len(pathArgs) == 0:
|
||||
target = gnomacfg.ProjectConfigPath()
|
||||
case len(pathArgs) == 1:
|
||||
target = pathArgs[0]
|
||||
default:
|
||||
fmt.Fprintln(os.Stderr, "usage: gnoma upgrade-config [--dry-run] [--global | --all | <path>]")
|
||||
return 1
|
||||
}
|
||||
|
||||
// Friendly "nothing to upgrade" when the default target
|
||||
// doesn't exist. We only do this for the default targets
|
||||
// (project/global); an explicit path the user typed that
|
||||
// doesn't exist is a real error surfaced by Upgrade() below.
|
||||
if global || len(pathArgs) == 0 {
|
||||
if _, err := os.Stat(target); os.IsNotExist(err) {
|
||||
fmt.Printf("%s: no such file, nothing to upgrade\n", target)
|
||||
fmt.Println("hint: pass an explicit path, or use --global for the user-level config")
|
||||
return 0
|
||||
}
|
||||
}
|
||||
|
||||
if dryRun {
|
||||
return runUpgradeConfigDryRun(target)
|
||||
}
|
||||
return runUpgradeConfigApply(target)
|
||||
}
|
||||
|
||||
// runUpgradeConfigAll walks the registry and upgrades the
|
||||
// global config + every known project's config. Per-file
|
||||
// behaviour mirrors the single-file path: friendly "no such
|
||||
// file" exit 0 when the project hasn't grown its config yet,
|
||||
// real Upgrade() on files that exist, backup+diff on changes.
|
||||
// Returns non-zero if any file failed or was changed (in
|
||||
// dry-run mode) so CI can catch dirty configs.
|
||||
func runUpgradeConfigAll(dryRun bool) int {
|
||||
loaded, err := gnomacfg.LoadRegistry()
|
||||
if err != nil {
|
||||
fmt.Fprintf(os.Stderr, "error: load registry: %v\n", err)
|
||||
return 1
|
||||
}
|
||||
|
||||
// Always include the global config; then per-project.
|
||||
paths := []string{gnomacfg.GlobalConfigPath()}
|
||||
for _, p := range loaded.Projects {
|
||||
paths = append(paths, gnomacfg.ProjectConfigPathFor(p.Path))
|
||||
}
|
||||
// Dedupe + sort for deterministic output. (Dedupe matters
|
||||
// only if the registry has the project root as its own
|
||||
// cwd — uncommon but possible.)
|
||||
seen := map[string]bool{}
|
||||
var deduped []string
|
||||
for _, p := range paths {
|
||||
if seen[p] {
|
||||
continue
|
||||
}
|
||||
seen[p] = true
|
||||
deduped = append(deduped, p)
|
||||
}
|
||||
sort.Strings(deduped)
|
||||
paths = deduped
|
||||
|
||||
anyFailed := false
|
||||
anyChanged := false
|
||||
for _, p := range paths {
|
||||
// Friendly "no such file" on first run — many registered
|
||||
// projects won't have a .gnoma/config.toml yet.
|
||||
if _, err := os.Stat(p); os.IsNotExist(err) {
|
||||
fmt.Printf("%s: no such file, nothing to upgrade\n", p)
|
||||
continue
|
||||
}
|
||||
|
||||
var rc int
|
||||
if dryRun {
|
||||
rc = runUpgradeConfigDryRun(p)
|
||||
} else {
|
||||
rc = runUpgradeConfigApply(p)
|
||||
}
|
||||
if rc != 0 {
|
||||
anyFailed = true
|
||||
}
|
||||
// Per-file handlers print their own "upgraded" /
|
||||
// "already clean" line; the aggregate exit code just
|
||||
// reports "any failure". (Tracking "any change" would
|
||||
// need a non-printing variant of the helpers; deferred.)
|
||||
_ = anyChanged
|
||||
}
|
||||
|
||||
if anyFailed {
|
||||
return 1
|
||||
}
|
||||
return 0
|
||||
}
|
||||
|
||||
func runUpgradeConfigApply(path string) int {
|
||||
res, err := gnomacfg.Upgrade(path)
|
||||
if err != nil {
|
||||
fmt.Fprintf(os.Stderr, "error: %v\n", err)
|
||||
return 1
|
||||
}
|
||||
if !res.Changed {
|
||||
fmt.Printf("%s: already clean, nothing to do\n", path)
|
||||
return 0
|
||||
}
|
||||
fmt.Printf("%s: upgraded (backup at %s)\n\n", path, res.BackupPath)
|
||||
fmt.Println(res.Diff)
|
||||
return 0
|
||||
}
|
||||
|
||||
func runUpgradeConfigDryRun(path string) int {
|
||||
// For the dry-run, snapshot the file, run Upgrade, restore
|
||||
// the original from the backup, and only print the diff.
|
||||
// (Upgrade is destructive by design — it writes the cleaned
|
||||
// form before we have a chance to inspect the diff. The
|
||||
// backup+restore dance lets us preview without committing.)
|
||||
res, err := gnomacfg.Upgrade(path)
|
||||
if err != nil {
|
||||
fmt.Fprintf(os.Stderr, "error: %v\n", err)
|
||||
return 1
|
||||
}
|
||||
if !res.Changed {
|
||||
fmt.Printf("%s: already clean, nothing to do (dry run)\n", path)
|
||||
return 0
|
||||
}
|
||||
// Restore the original from the backup so the dry-run is
|
||||
// truly side-effect-free.
|
||||
if err := os.Rename(res.BackupPath, path); err != nil {
|
||||
fmt.Fprintf(os.Stderr, "warning: dry-run restore failed: %v\n", err)
|
||||
} else {
|
||||
// The rename already moved the backup back to the
|
||||
// original path; nothing left to remove. The os.Remove
|
||||
// below is a no-op in the happy case and surfaces a
|
||||
// warning only when the restore failed and a stray .bak
|
||||
// remains.
|
||||
if err := os.Remove(res.BackupPath); err != nil && !os.IsNotExist(err) {
|
||||
fmt.Fprintf(os.Stderr, "warning: could not remove dry-run backup %s: %v\n", res.BackupPath, err)
|
||||
}
|
||||
}
|
||||
fmt.Printf("%s: would upgrade (dry run; no changes written)\n\n", path)
|
||||
fmt.Println(res.Diff)
|
||||
return 0
|
||||
}
|
||||
@@ -0,0 +1,292 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"os"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
"testing"
|
||||
|
||||
gnomacfg "somegit.dev/Owlibou/gnoma/internal/config"
|
||||
)
|
||||
|
||||
// TestRunUpgradeConfig_DropsDefaultPointerField exercises the
|
||||
// happy path: a project config with `max_tokens = 8192` (the
|
||||
// default) gets the field dropped and a backup created.
|
||||
func TestRunUpgradeConfig_DropsDefaultPointerField(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
t.Setenv("XDG_CONFIG_HOME", dir)
|
||||
|
||||
origDir, _ := os.Getwd()
|
||||
projectDir := filepath.Join(dir, "project")
|
||||
if err := os.MkdirAll(projectDir, 0o755); err != nil {
|
||||
t.Fatalf("mkdir: %v", err)
|
||||
}
|
||||
if err := os.Chdir(projectDir); err != nil {
|
||||
t.Fatalf("chdir: %v", err)
|
||||
}
|
||||
t.Cleanup(func() { _ = os.Chdir(origDir) })
|
||||
|
||||
path := filepath.Join(projectDir, ".gnoma", "config.toml")
|
||||
if err := os.MkdirAll(filepath.Dir(path), 0o755); err != nil {
|
||||
t.Fatalf("mkdir: %v", err)
|
||||
}
|
||||
if err := os.WriteFile(path, []byte("[provider]\nmax_tokens = 8192\n"), 0o644); err != nil {
|
||||
t.Fatalf("seed: %v", err)
|
||||
}
|
||||
|
||||
if rc := runUpgradeConfigApply(path); rc != 0 {
|
||||
t.Fatalf("runUpgradeConfigApply rc=%d", rc)
|
||||
}
|
||||
got, _ := os.ReadFile(path)
|
||||
if strings.Contains(string(got), "max_tokens") {
|
||||
t.Errorf("max_tokens at default not dropped, got:\n%s", got)
|
||||
}
|
||||
// Backup file exists.
|
||||
entries, _ := os.ReadDir(filepath.Dir(path))
|
||||
backupFound := false
|
||||
for _, e := range entries {
|
||||
if strings.HasPrefix(e.Name(), "config.toml.bak-") {
|
||||
backupFound = true
|
||||
break
|
||||
}
|
||||
}
|
||||
if !backupFound {
|
||||
t.Errorf("no backup file created in %s", filepath.Dir(path))
|
||||
}
|
||||
}
|
||||
|
||||
// TestRunUpgradeConfig_DryRunNoSideEffects verifies that
|
||||
// --dry-run previews the diff without leaving the file modified.
|
||||
func TestRunUpgradeConfig_DryRunNoSideEffects(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
t.Setenv("XDG_CONFIG_HOME", dir)
|
||||
|
||||
origDir, _ := os.Getwd()
|
||||
projectDir := filepath.Join(dir, "project")
|
||||
if err := os.MkdirAll(projectDir, 0o755); err != nil {
|
||||
t.Fatalf("mkdir: %v", err)
|
||||
}
|
||||
if err := os.Chdir(projectDir); err != nil {
|
||||
t.Fatalf("chdir: %v", err)
|
||||
}
|
||||
t.Cleanup(func() { _ = os.Chdir(origDir) })
|
||||
|
||||
path := filepath.Join(projectDir, ".gnoma", "config.toml")
|
||||
if err := os.MkdirAll(filepath.Dir(path), 0o755); err != nil {
|
||||
t.Fatalf("mkdir: %v", err)
|
||||
}
|
||||
original := "[provider]\nmax_tokens = 8192\n"
|
||||
if err := os.WriteFile(path, []byte(original), 0o644); err != nil {
|
||||
t.Fatalf("seed: %v", err)
|
||||
}
|
||||
|
||||
if rc := runUpgradeConfigDryRun(path); rc != 0 {
|
||||
t.Fatalf("runUpgradeConfigDryRun rc=%d", rc)
|
||||
}
|
||||
|
||||
// File should be byte-identical to the original.
|
||||
got, _ := os.ReadFile(path)
|
||||
if string(got) != original {
|
||||
t.Errorf("dry-run modified the file, got:\n%s\nwant:\n%s", got, original)
|
||||
}
|
||||
|
||||
// No backup file should remain (dry-run cleans up its own backup).
|
||||
entries, _ := os.ReadDir(filepath.Dir(path))
|
||||
for _, e := range entries {
|
||||
if e.Name() != "config.toml" {
|
||||
t.Errorf("dry-run left extra file: %q", e.Name())
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// TestRunUpgradeConfig_AlreadyCleanIsNoOp verifies that a config
|
||||
// that has only user-set non-default values produces a "nothing
|
||||
// to do" message and exit 0 — no backup, no rewrite.
|
||||
func TestRunUpgradeConfig_AlreadyCleanIsNoOp(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
t.Setenv("XDG_CONFIG_HOME", dir)
|
||||
|
||||
origDir, _ := os.Getwd()
|
||||
projectDir := filepath.Join(dir, "project")
|
||||
if err := os.MkdirAll(projectDir, 0o755); err != nil {
|
||||
t.Fatalf("mkdir: %v", err)
|
||||
}
|
||||
if err := os.Chdir(projectDir); err != nil {
|
||||
t.Fatalf("chdir: %v", err)
|
||||
}
|
||||
t.Cleanup(func() { _ = os.Chdir(origDir) })
|
||||
|
||||
path := filepath.Join(projectDir, ".gnoma", "config.toml")
|
||||
if err := os.MkdirAll(filepath.Dir(path), 0o755); err != nil {
|
||||
t.Fatalf("mkdir: %v", err)
|
||||
}
|
||||
clean := "[provider]\ndefault = \"anthropic\"\n"
|
||||
if err := os.WriteFile(path, []byte(clean), 0o644); err != nil {
|
||||
t.Fatalf("seed: %v", err)
|
||||
}
|
||||
|
||||
if rc := runUpgradeConfigApply(path); rc != 0 {
|
||||
t.Errorf("rc = %d, want 0 for already-clean file", rc)
|
||||
}
|
||||
|
||||
// File content unchanged.
|
||||
got, _ := os.ReadFile(path)
|
||||
if string(got) != clean {
|
||||
t.Errorf("already-clean file modified, got:\n%s", got)
|
||||
}
|
||||
// No backup created.
|
||||
entries, _ := os.ReadDir(filepath.Dir(path))
|
||||
for _, e := range entries {
|
||||
if e.Name() != "config.toml" {
|
||||
t.Errorf("no-op left extra file: %q", e.Name())
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// TestRunUpgradeConfig_MissingProjectConfigIsFriendly verifies the
|
||||
// user-experience fix for the 2026-06-04 follow-up: when the
|
||||
// project .gnoma/config.toml doesn't exist, print a friendly
|
||||
// "nothing to upgrade" message and exit 0 instead of a hard
|
||||
// "no such file or directory" error. The user can pass an
|
||||
// explicit path or use --global.
|
||||
func TestRunUpgradeConfig_MissingProjectConfigIsFriendly(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
t.Setenv("XDG_CONFIG_HOME", dir)
|
||||
|
||||
origDir, _ := os.Getwd()
|
||||
projectDir := filepath.Join(dir, "project")
|
||||
if err := os.MkdirAll(projectDir, 0o755); err != nil {
|
||||
t.Fatalf("mkdir: %v", err)
|
||||
}
|
||||
if err := os.Chdir(projectDir); err != nil {
|
||||
t.Fatalf("chdir: %v", err)
|
||||
}
|
||||
t.Cleanup(func() { _ = os.Chdir(origDir) })
|
||||
|
||||
// No .gnoma/ dir at all — Upgrade() would error.
|
||||
if rc := runUpgradeConfigCommand(nil); rc != 0 {
|
||||
t.Errorf("rc = %d, want 0 for missing project config (friendly exit)", rc)
|
||||
}
|
||||
}
|
||||
|
||||
// TestRunUpgradeConfig_MissingGlobalConfigIsFriendly mirrors
|
||||
// the above for --global. The user-level config not existing
|
||||
// is also "nothing to upgrade", not an error.
|
||||
func TestRunUpgradeConfig_MissingGlobalConfigIsFriendly(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
t.Setenv("XDG_CONFIG_HOME", dir)
|
||||
// Don't create the global config dir either.
|
||||
|
||||
if rc := runUpgradeConfigCommand([]string{"--global"}); rc != 0 {
|
||||
t.Errorf("rc = %d, want 0 for missing global config (friendly exit)", rc)
|
||||
}
|
||||
}
|
||||
|
||||
// TestRunUpgradeConfig_GlobalFlagUpgradesGlobalConfig verifies
|
||||
// the --global flag actually points at the global config and
|
||||
// upgrades it.
|
||||
func TestRunUpgradeConfig_GlobalFlagUpgradesGlobalConfig(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
t.Setenv("XDG_CONFIG_HOME", dir)
|
||||
|
||||
// Seed a global config with a default-equivalent field.
|
||||
globalDir := filepath.Join(dir, "gnoma")
|
||||
if err := os.MkdirAll(globalDir, 0o755); err != nil {
|
||||
t.Fatalf("mkdir: %v", err)
|
||||
}
|
||||
globalPath := filepath.Join(globalDir, "config.toml")
|
||||
if err := os.WriteFile(globalPath, []byte("[provider]\nmax_tokens = 8192\n"), 0o644); err != nil {
|
||||
t.Fatalf("seed: %v", err)
|
||||
}
|
||||
|
||||
if rc := runUpgradeConfigCommand([]string{"--global"}); rc != 0 {
|
||||
t.Errorf("rc = %d, want 0", rc)
|
||||
}
|
||||
|
||||
got, _ := os.ReadFile(globalPath)
|
||||
if strings.Contains(string(got), "max_tokens") {
|
||||
t.Errorf("max_tokens at default not dropped from global config, got:\n%s", got)
|
||||
}
|
||||
}
|
||||
|
||||
// TestRunUpgradeConfig_GlobalWithExplicitPathIsError verifies
|
||||
// the mutually-exclusive-flag handling: --global and an
|
||||
// explicit path can't both be supplied.
|
||||
func TestRunUpgradeConfig_GlobalWithExplicitPathIsError(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
t.Setenv("XDG_CONFIG_HOME", dir)
|
||||
|
||||
if rc := runUpgradeConfigCommand([]string{"--global", "/tmp/somewhere/config.toml"}); rc != 1 {
|
||||
t.Errorf("rc = %d, want 1 for --global + explicit path", rc)
|
||||
}
|
||||
}
|
||||
|
||||
// TestRunUpgradeConfig_AllFlagWalksRegistry verifies the
|
||||
// --all mode: a registry with one project that has a
|
||||
// zero-spammed config gets that config upgraded.
|
||||
func TestRunUpgradeConfig_AllFlagWalksRegistry(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
t.Setenv("XDG_CONFIG_HOME", dir)
|
||||
|
||||
// Seed a registry entry pointing at a project with a
|
||||
// zero-spammed config.
|
||||
projectDir := filepath.Join(dir, "project")
|
||||
if err := os.MkdirAll(filepath.Join(projectDir, ".gnoma"), 0o755); err != nil {
|
||||
t.Fatalf("mkdir: %v", err)
|
||||
}
|
||||
projectConfig := filepath.Join(projectDir, ".gnoma", "config.toml")
|
||||
if err := os.WriteFile(projectConfig, []byte("[provider]\nmax_tokens = 8192\n"), 0o644); err != nil {
|
||||
t.Fatalf("seed project: %v", err)
|
||||
}
|
||||
|
||||
reg, _ := gnomacfg.LoadRegistry()
|
||||
if err := reg.Record(projectDir); err != nil {
|
||||
t.Fatalf("Record: %v", err)
|
||||
}
|
||||
|
||||
if rc := runUpgradeConfigCommand([]string{"--all"}); rc != 0 {
|
||||
t.Errorf("rc = %d, want 0", rc)
|
||||
}
|
||||
|
||||
// Project config should be cleaned.
|
||||
got, _ := os.ReadFile(projectConfig)
|
||||
if strings.Contains(string(got), "max_tokens") {
|
||||
t.Errorf("max_tokens at default not dropped, got:\n%s", got)
|
||||
}
|
||||
}
|
||||
|
||||
// TestRunUpgradeConfig_AllFlagHandlesMissingProjectFiles
|
||||
// documents the "first-run" path: the registry might list
|
||||
// projects that haven't grown their config yet. The handler
|
||||
// should report "no such file" and exit 0.
|
||||
func TestRunUpgradeConfig_AllFlagHandlesMissingProjectFiles(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
t.Setenv("XDG_CONFIG_HOME", dir)
|
||||
|
||||
// Seed a registry entry pointing at a project with NO
|
||||
// .gnoma/config.toml.
|
||||
projectDir := filepath.Join(dir, "project-no-config")
|
||||
if err := os.MkdirAll(projectDir, 0o755); err != nil {
|
||||
t.Fatalf("mkdir: %v", err)
|
||||
}
|
||||
|
||||
reg, _ := gnomacfg.LoadRegistry()
|
||||
if err := reg.Record(projectDir); err != nil {
|
||||
t.Fatalf("Record: %v", err)
|
||||
}
|
||||
|
||||
if rc := runUpgradeConfigCommand([]string{"--all"}); rc != 0 {
|
||||
t.Errorf("rc = %d, want 0 (missing files are friendly exits)", rc)
|
||||
}
|
||||
}
|
||||
|
||||
// TestRunUpgradeConfig_AllFlagMutuallyExclusiveWithPath
|
||||
// verifies --all and an explicit path are mutually exclusive.
|
||||
func TestRunUpgradeConfig_AllFlagMutuallyExclusiveWithPath(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
t.Setenv("XDG_CONFIG_HOME", dir)
|
||||
|
||||
if rc := runUpgradeConfigCommand([]string{"--all", "/tmp/somewhere/config.toml"}); rc != 1 {
|
||||
t.Errorf("rc = %d, want 1 for --all + explicit path", rc)
|
||||
}
|
||||
}
|
||||
+24
-10
@@ -24,27 +24,41 @@ The "ollama" path is the easiest if you're already running a local model — it
|
||||
|
||||
## Presets
|
||||
|
||||
Presets use `reecdev/tiny3.5:500m` as the default model — a 500 M-parameter Qwen3.5 distillation with tool support, available on Ollama. Pull it once with:
|
||||
Presets use `qwen3:0.6b` as the default model — a 600 M-parameter Qwen3 instruction-tuned model with native `/no_think` support, available on Ollama. Pull it once with:
|
||||
|
||||
```bash
|
||||
ollama pull reecdev/tiny3.5:500m # ~1 GB
|
||||
# or the 1.5 B variant for slightly better quality:
|
||||
ollama pull reecdev/tiny3.5:1.5b # ~3 GB
|
||||
ollama pull qwen3:0.6b # ~520 MB
|
||||
```
|
||||
|
||||
### Model choice notes
|
||||
|
||||
Empirical testing (2026-05-25) across three candidate SLMs on identical prompts:
|
||||
|
||||
| Model | Classifier success | Notes |
|
||||
|---|---|---|
|
||||
| `qwen3:0.6b` | consistent across trivial + knowledge prompts | recommended default; honours `/no_think` cleanly |
|
||||
| `functiongemma:270m` | works on trivial prompts, derails on knowledge ones | needs function-signature prompt rewrite or LoRA fine-tune to be reliable |
|
||||
| `gemma3:1b` | unusable | emits malformed JSON (just `{` or invented keys) |
|
||||
| `reecdev/tiny3.5:1.5b` | unusable | thinking-mode distillation; ignores `/no_think` and emits `<Thought Process>` blocks |
|
||||
| `qwen2.5-coder:1.5b` | unusable | code-completion-tuned; ignores the classifier prompt entirely and answers in prose |
|
||||
|
||||
Substitute any small Ollama model you prefer. The probe at startup reads each model's actual capability — `tools` enables the SLM arm to handle simple file reads; without it, the SLM only handles knowledge-only prompts.
|
||||
|
||||
If your SLM is task-specialised (function-call models like FunctionGemma; embedding-only models; code-completion-tuned models) and produces wrong-shape output when asked to answer a general prompt, set `register_as_arm = false` so the SLM stays classifier-only and execution routes to other local arms.
|
||||
|
||||
### Preset 1 — Ollama (recommended for most users)
|
||||
|
||||
```toml
|
||||
[slm]
|
||||
enabled = true
|
||||
backend = "ollama"
|
||||
model = "reecdev/tiny3.5:500m"
|
||||
enabled = true
|
||||
backend = "ollama"
|
||||
model = "qwen3:0.6b"
|
||||
register_as_arm = true # default; set false for classifier-only models
|
||||
classify_timeout = "15s" # default; bump for slow cold-load
|
||||
# base_url defaults to http://localhost:11434
|
||||
```
|
||||
|
||||
Prereq: `ollama pull reecdev/tiny3.5:500m` (or any model you'd rather use).
|
||||
Prereq: `ollama pull qwen3:0.6b` (or any model you'd rather use).
|
||||
|
||||
### Preset 2 — llama.cpp server
|
||||
|
||||
@@ -150,10 +164,10 @@ Output looks like:
|
||||
```
|
||||
slm enabled: true
|
||||
slm backend: ollama
|
||||
model: reecdev/tiny3.5:500m
|
||||
model: qwen3:0.6b
|
||||
|
||||
live probe:
|
||||
✓ ollama ready (model=reecdev/tiny3.5:500m, boot=0s)
|
||||
✓ ollama ready (model=qwen3:0.6b, boot=0s)
|
||||
```
|
||||
|
||||
Run a few prompts, then check:
|
||||
|
||||
@@ -1,5 +1,14 @@
|
||||
# Tool-Router Specialization (functiongemma) — 2026-05-23
|
||||
|
||||
> **Companion plan from 2026-05-25:**
|
||||
> [`2026-05-25-encoder-bandit-router.md`](2026-05-25-encoder-bandit-router.md)
|
||||
> sketches an alternative architecture (encoder + contextual bandit
|
||||
> instead of decoder-SLM-as-classifier). The two are complementary,
|
||||
> not competing — FunctionGemma fits as the optional Phase 5 "JSON
|
||||
> sanity layer" in that plan. Decide which track to invest in based
|
||||
> on the did-switch-rate telemetry (this plan) vs the bandit-data
|
||||
> accumulation (companion plan).
|
||||
|
||||
Follow-up to
|
||||
[`2026-05-19-post-slm-unlock.md`](2026-05-19-post-slm-unlock.md)
|
||||
Phase A, which shipped two-stage tool routing: round 1 sends a single
|
||||
|
||||
@@ -0,0 +1,344 @@
|
||||
# Encoder + Contextual-Bandit Router — 2026-05-25
|
||||
|
||||
Proposes a long-arc architectural rethink of gnoma's routing layer:
|
||||
**replace the decoder-SLM-as-classifier design with an encoder-only
|
||||
embedding model feeding a contextual bandit policy**, and treat a
|
||||
strict tiny SLM (FunctionGemma-270M-it) as the optional "emit a
|
||||
structured route decision" layer rather than the primary classifier.
|
||||
|
||||
Surfaced from external research (RouteLLM, ModernBERT, Gemma 3
|
||||
270M, Qwen3-Embedding, BGE-M3) brought into the 2026-05-25
|
||||
diagnostic session where gnoma's current decoder-SLM classifier
|
||||
exhibited a 100% failure rate across two model swaps
|
||||
(`reecdev/tiny3.5:1.5b`, `qwen2.5-coder:1.5b`).
|
||||
|
||||
This plan is **strategic / multi-month**. Phase 1 below is the only
|
||||
piece scoped for near-term implementation; everything else hinges on
|
||||
the bandit-vs-SLM strategic decision tracked in the existing
|
||||
`Bandit selector — design decisions deferred` TODO entry.
|
||||
|
||||
Sibling plans:
|
||||
[`2026-05-23-tool-router-specialization.md`](2026-05-23-tool-router-specialization.md)
|
||||
already covers the **FunctionGemma fine-tune** track as the
|
||||
strict-SLM option; this plan adds the **encoder + bandit** track
|
||||
as the alternative (and arguably better-suited) architecture.
|
||||
|
||||
---
|
||||
|
||||
## Problem
|
||||
|
||||
The current router has three coupled problems:
|
||||
|
||||
1. **The classifier is a decoder LLM in a job an encoder would do
|
||||
better.** Routing is a classification task with cost/quality
|
||||
trade-offs, not a reasoning task. Asking a decoder model to emit
|
||||
structured JSON for every classify call is high-latency, fragile
|
||||
to chain-of-thought leakage, and indeterministic.
|
||||
|
||||
2. **The bandit can't actually learn quality** because the only
|
||||
success signal is `err == nil` (per `internal/engine/loop.go:118`).
|
||||
EMA scores converge to 1.00 for every arm — see the 2026-05-24
|
||||
`router stats` snapshot where 22 of 25 arm/task pairs sit at
|
||||
exactly 1.00.
|
||||
|
||||
3. **The classifier and bandit live in adjacent code but were
|
||||
designed in separate phases**, so the integration point (`Task`
|
||||
built by SLM classifier → fed to `selectBest`) is just data
|
||||
flow, not a learning loop. The SLM's wins/losses don't update
|
||||
the SLM; the bandit's wins/losses don't change which arms the
|
||||
classifier considers.
|
||||
|
||||
The 100% SLM-failure incident on 2026-05-25 made (1) urgent. The
|
||||
zero-discrimination EMA on 2026-05-24 made (2) urgent. (3) is the
|
||||
underlying integration debt.
|
||||
|
||||
---
|
||||
|
||||
## Non-goals
|
||||
|
||||
- **Killing the existing SLM classifier today.** Phase 1 of this
|
||||
plan is purely additive (encoder feature extraction); the existing
|
||||
classifier stays as a baseline until the new path is measurably
|
||||
better.
|
||||
- **Reimplementing bandit math.** LinUCB and Thompson Sampling are
|
||||
well-understood. The work is the feature pipeline and reward
|
||||
function, not the policy core.
|
||||
- **Choosing a single embedding model permanently.** Phase 1 ships
|
||||
with a default but exposes a `[slm.embedding].model` knob so
|
||||
swapping is config-only.
|
||||
- **The strict-SLM track.** FunctionGemma fine-tuning is the sibling
|
||||
`2026-05-23-tool-router-specialization.md` plan; this plan
|
||||
references it but does not duplicate it.
|
||||
|
||||
---
|
||||
|
||||
## Background — research summary
|
||||
|
||||
Citations follow the user-provided research thread (RouteLLM 2024,
|
||||
ModernBERT 2024, Google FunctionGemma 2025).
|
||||
|
||||
- **RouteLLM** tested router types as a classification problem:
|
||||
similarity routing, matrix factorization, BERT classifier, causal
|
||||
LLM classifier. The BERT classifier was competitive with the
|
||||
causal-LLM classifier at lower cost and latency. Routing is a
|
||||
classification task; treating it like a generation task is paying
|
||||
generation cost for classification value.
|
||||
- **ModernBERT** (Dec 2024) is an encoder-only model with 8k context,
|
||||
trained partly on code, designed for fast classification and
|
||||
retrieval. The 'base' size is ~150M parameters, the 'large' size
|
||||
~400M. Both are tiny compared to even small decoder LLMs.
|
||||
- **FunctionGemma-270M-it** (Aug 2025) is Google's small model
|
||||
fine-tuned for natural-language → function-call output. Google's
|
||||
own positioning materials list **query routing** as a use case.
|
||||
- **Qwen3-Embedding-0.6B** and **BGE-M3** are strong multilingual
|
||||
embedding models with long-context support; either can serve as
|
||||
feature extractors for downstream classification or bandit
|
||||
policies.
|
||||
|
||||
The throughline: **encoder models are the right tool for the
|
||||
classification side of routing**; generative SLMs (FunctionGemma)
|
||||
are the right tool only when the *output* must be a structured
|
||||
decision blob with confidence + tags + fallback. For pure routing,
|
||||
encoder features + bandit policy is cheaper, faster, more
|
||||
deterministic.
|
||||
|
||||
---
|
||||
|
||||
## Approach overview
|
||||
|
||||
Five phases. Phase 1 is near-term; Phases 2–4 are the actual
|
||||
architectural shift; Phase 5 is the long-arc fine-tune.
|
||||
|
||||
### Phase 1 — Embedding feature scaffold (near-term, additive)
|
||||
|
||||
Add an embedding pipeline that runs alongside the existing
|
||||
classifier. Extract features for every prompt; log them to disk
|
||||
next to the existing quality-EMA. No routing decision changes yet.
|
||||
|
||||
**Why first:** lets us build up a labelled dataset of (prompt,
|
||||
features, arm, outcome) tuples without disturbing today's routing
|
||||
behaviour. Phase 2 trains against this dataset.
|
||||
|
||||
### Phase 2 — Contextual bandit over the feature set
|
||||
|
||||
Once Phase 1 has ~500–1000 labelled observations, swap `selectBest`
|
||||
from heuristic quality + EMA score to a LinUCB-style contextual
|
||||
bandit that takes the embedding features + the existing arm metadata
|
||||
(MaxComplexity, CostWeight, Strengths). The existing EMA quality
|
||||
score becomes one feature among many.
|
||||
|
||||
### Phase 3 — Retire the decoder-SLM classifier
|
||||
|
||||
When Phase 2 routing is measurably better than today's heuristic +
|
||||
EMA blend, the decoder-SLM classifier (currently producing 0
|
||||
useful classifications on the user's setup) is no longer
|
||||
load-bearing. Deprecate it; keep the same `[slm]` config knobs for
|
||||
backwards compatibility but route them at a different runtime path.
|
||||
|
||||
### Phase 4 — ModernBERT fine-tune
|
||||
|
||||
The off-the-shelf embedding model from Phase 1 (BGE-M3 or
|
||||
Qwen3-Embedding-0.6B by default) gives general-purpose embeddings.
|
||||
Phase 4 fine-tunes a router-specific classification head on top of
|
||||
ModernBERT-base using the labelled dataset accumulated since Phase
|
||||
1. Pure performance win; falls back gracefully to off-the-shelf
|
||||
embeddings if the fine-tune isn't loaded.
|
||||
|
||||
### Phase 5 — FunctionGemma JSON sanity layer (optional)
|
||||
|
||||
For users who want a structured route decision (arm + confidence +
|
||||
fallback) alongside or instead of the bandit output, plug
|
||||
FunctionGemma-270M-it (fine-tuned per the
|
||||
`tool-router-specialization` plan) as a final-stage decision blob
|
||||
emitter. Sits *after* the encoder + bandit, not in front of them.
|
||||
|
||||
---
|
||||
|
||||
## Phase 1 — Embedding feature scaffold (detailed)
|
||||
|
||||
This is the only phase scoped for near-term implementation. The
|
||||
others depend on Phase 1's data accumulation.
|
||||
|
||||
### What lands
|
||||
|
||||
- New package `internal/router/features` with:
|
||||
- `Embedder` interface: `Embed(ctx, prompt string) ([]float32, error)`.
|
||||
- Implementations: `OllamaEmbedder`, `BGE3Embedder`, `NoopEmbedder`
|
||||
(default; returns nil features when no embedding model is
|
||||
configured).
|
||||
- New config `[slm.embedding]` section:
|
||||
```toml
|
||||
[slm.embedding]
|
||||
enabled = false # default off; opt-in
|
||||
backend = "ollama" # ollama | bge-m3 | noop
|
||||
model = "qwen3-embedding:0.6b" # ollama model tag
|
||||
base_url = "" # backend endpoint override
|
||||
```
|
||||
- Feature extraction hook in `internal/engine/loop.go`: after the
|
||||
classifier runs but before `selectBest`, compute the embedding
|
||||
for the prompt and attach to the routing `Task` as an opaque
|
||||
`Features []float32` field.
|
||||
- New on-disk store at `~/.config/gnoma/router-features.jsonl`,
|
||||
one record per observation: `{ts, prompt_hash, features,
|
||||
task_type, arm_id, success, tokens, duration}`.
|
||||
- `prompt_hash` is a SHA-256 of the prompt — never the prompt
|
||||
itself — to keep the file local-only-but-not-secret-laden.
|
||||
- Append-only, atomic-write, incognito-gated, same discipline as
|
||||
the firewall audit log.
|
||||
- No selector change. `selectBest` continues to use today's
|
||||
heuristic + EMA blend. Phase 1 just observes.
|
||||
|
||||
### Why off by default
|
||||
|
||||
Embedding inference adds 50–200ms per prompt depending on backend
|
||||
and model size. That latency is fine for ollama users running on
|
||||
a workstation, painful for users on slower setups. Opt-in keeps
|
||||
the regression risk at zero.
|
||||
|
||||
### Phase 1 task list
|
||||
|
||||
- **F1-1:** Define the `Embedder` interface and `NoopEmbedder` in
|
||||
`internal/router/features/`.
|
||||
- **F1-2:** `OllamaEmbedder` wraps `provider/openaicompat` with the
|
||||
ollama embedding endpoint (`/api/embeddings`).
|
||||
- **F1-3:** Add the `[slm.embedding]` config section to
|
||||
`internal/config/config.go` with the same defaults-via-zero
|
||||
discipline as the rest of the config.
|
||||
- **F1-4:** Wire the embedder into `loop.go` between classifier and
|
||||
selector. Failures log at Debug and don't block routing.
|
||||
- **F1-5:** Append-only feature store in
|
||||
`~/.config/gnoma/router-features.jsonl` with atomic writes,
|
||||
incognito gate, opt-out via `[slm.embedding].enabled = false`.
|
||||
- **F1-6:** Tests covering: embedder mock + observation record;
|
||||
noop embedder produces empty features; incognito skips the
|
||||
store entirely.
|
||||
|
||||
---
|
||||
|
||||
## Phase 2+ — Bandit policy (sketch only; needs data first)
|
||||
|
||||
Spelled out for context. Not for near-term implementation.
|
||||
|
||||
### Feature set per the research
|
||||
|
||||
```
|
||||
prompt_embedding — 384-1024 dim depending on model
|
||||
token_count — len of tokenized prompt
|
||||
language — ISO code from a small lang-detect
|
||||
has_code — fenced-block heuristic
|
||||
has_error_log — pattern match for stack traces
|
||||
needs_tools — from current heuristic
|
||||
needs_vision — from [Image:...] markers
|
||||
estimated_complexity — current heuristic score
|
||||
requested_latency — turn-budget hint (future)
|
||||
arm_context_window — from arm metadata
|
||||
arm_vram_cost — from arm metadata
|
||||
arm_avg_latency — from quality EMA
|
||||
arm_success_rate — from quality EMA
|
||||
```
|
||||
|
||||
### Reward function per the research
|
||||
|
||||
```
|
||||
reward = quality_score
|
||||
- latency_penalty
|
||||
- vram_penalty
|
||||
- failure_penalty
|
||||
- escalation_penalty
|
||||
```
|
||||
|
||||
- `quality_score`: 1.0 on success, 0.0 on hard error today; richer
|
||||
signal (elf-mediated, user thumbs, tool-call success) once the
|
||||
TODO `Bandit selector — design decisions deferred` resolves.
|
||||
- `latency_penalty`: monotone in observed seconds.
|
||||
- `vram_penalty`: monotone in declared VRAM cost.
|
||||
- `failure_penalty`: hard cost on explicit errors (sandbox
|
||||
denied, parse failed).
|
||||
- `escalation_penalty`: cost when a downstream elf had to escalate
|
||||
to a heavier arm because this arm failed.
|
||||
|
||||
### Policy
|
||||
|
||||
LinUCB (linear contextual bandit, deterministic exploration
|
||||
bounded by UCB) or Thompson Sampling (Bayesian, smoother
|
||||
exploration). LinUCB is the safer starting point — fewer
|
||||
hyperparameters, well-known behaviour, easier to debug.
|
||||
|
||||
---
|
||||
|
||||
## Risks
|
||||
|
||||
- **Latency.** Embedding inference adds 50–200ms per prompt. Phase
|
||||
1's opt-in default means users see no regression; Phase 2's
|
||||
"make it default" decision requires latency benchmarks first.
|
||||
- **Data sparsity for fine-tuning (Phase 4).** ModernBERT
|
||||
fine-tuning needs ~10k labelled observations to start being
|
||||
useful. Phase 1 might run for months before Phase 4 is viable.
|
||||
Plan B: synthesise labels from existing prompt logs + rule-based
|
||||
pre-labels.
|
||||
- **Off-the-shelf embedding quality.** BGE-M3 / Qwen3-Embedding
|
||||
weren't trained specifically for routing decisions. Phase 4
|
||||
exists precisely to close this gap; Phase 1's data accumulation
|
||||
is what makes Phase 4 possible.
|
||||
- **Architectural complexity.** This plan introduces an entire new
|
||||
ML pipeline (embedder → feature store → bandit → reward loop).
|
||||
Phase 1 keeps it side-by-side with the existing path; Phase 2's
|
||||
"swap" decision is reversible because the existing path stays
|
||||
in code.
|
||||
- **Privacy.** Prompt hashes (not raw prompts) in the feature
|
||||
store. Still a local-only file; same opt-out plumbing as the
|
||||
project registry from the config-migration plan.
|
||||
|
||||
---
|
||||
|
||||
## Open questions
|
||||
|
||||
- **Should the feature store be per-project or global?** Per-project
|
||||
is more privacy-respecting (one project's prompts don't influence
|
||||
another's routing). Global is more data-efficient (more samples
|
||||
→ better bandit). Phase 1 chooses global by default; revisit
|
||||
during Phase 2.
|
||||
- **How does this interact with `[router].prefer = local|cloud`?**
|
||||
Easy answer: prefer policy stays as a hard tier-shift, applied
|
||||
after bandit selection. Bandit picks the best feasible arm; the
|
||||
prefer policy is consulted as a final filter / weight.
|
||||
- **What about CLI-agent subprocess arms?** They proxy to cloud but
|
||||
run locally; today's `prefer` treats them as non-local. Bandit
|
||||
features should include `is_subprocess` as a distinct feature
|
||||
so the policy can learn the user's preferences for those arms
|
||||
independent of local/cloud.
|
||||
- **Cold start.** With no observations, the bandit defaults to
|
||||
pure exploration. Should we seed with the existing heuristic
|
||||
defaults from `internal/router/defaults.go`? Probably yes —
|
||||
warm-start with the curated Strengths as priors.
|
||||
|
||||
---
|
||||
|
||||
## Rollout
|
||||
|
||||
- **Phase 1** ships as v0.5.0 (additive, opt-in, no behaviour
|
||||
change by default). Schema-touching so warrants a minor bump.
|
||||
- **Phase 2** ships when Phase 1 has accumulated enough data
|
||||
(~500–1000 observations per user) — opt-in via
|
||||
`[router].bandit_policy = "linucb"` initially, becoming default
|
||||
in a later release once measured better.
|
||||
- **Phase 3 (deprecation of decoder-SLM classifier)** is a v0.6.x
|
||||
conversation, gated on Phase 2 measurably outperforming.
|
||||
- **Phase 4 (ModernBERT fine-tune)** is v0.7+ — requires the
|
||||
fine-tuned model artifact distributed via Ollama or HF, plus
|
||||
the auto-download story.
|
||||
- **Phase 5 (FunctionGemma sanity layer)** is independent of all
|
||||
of the above; lands when the sibling `tool-router-specialization`
|
||||
plan justifies it on did-switch-rate telemetry.
|
||||
|
||||
---
|
||||
|
||||
## Cross-references
|
||||
|
||||
- TODO.md entry "Bandit selector — design decisions deferred" —
|
||||
the strategic question this plan answers in the long run.
|
||||
- TODO.md entry "Tool-router specialization (functiongemma)" — the
|
||||
sibling track; complementary, not competing.
|
||||
- [`2026-05-23-tool-router-specialization.md`](2026-05-23-tool-router-specialization.md) — FunctionGemma fine-tune plan.
|
||||
- [`2026-05-07-gnoma-roadmap.md`](2026-05-07-gnoma-roadmap.md) §Phase 4 — the original "re-evaluate bandit learning" entry.
|
||||
- 2026-05-25 diagnostic session (this conversation) — the trigger.
|
||||
@@ -0,0 +1,375 @@
|
||||
# Agent Client Protocol (ACP) — 2026-06-04
|
||||
|
||||
Adds **both directions** of ACP to gnoma:
|
||||
|
||||
1. **gnoma as ACP agent (server)** — `gnoma acp` over stdio so any
|
||||
ACP-capable editor (Zed, Kiro, OpenCode, …) can drive gnoma as an
|
||||
external coding agent.
|
||||
2. **gnoma as ACP client** — gnoma spawns *external* ACP agents
|
||||
(Claude, Gemini CLI, Codex, …) and exposes them as router-arm
|
||||
provider backends, the standardized successor to the current
|
||||
`internal/provider/subprocess` CLI-agent arms.
|
||||
|
||||
Adds the TODO.md entry "Agent Client Protocol (ACP) support".
|
||||
|
||||
Upstream: <https://github.com/agentclientprotocol> ·
|
||||
spec <https://agentclientprotocol.com>
|
||||
|
||||
---
|
||||
|
||||
## Problem
|
||||
|
||||
ACP is "the LSP for AI coding agents": a JSON-RPC 2.0 protocol, spoken
|
||||
over stdio, that lets editors (clients) spawn agents (subprocesses) and
|
||||
talk to them in a standard way — eliminating point-to-point editor↔agent
|
||||
integrations. Zed, Kiro, OpenCode and others are clients; Claude, Gemini
|
||||
CLI, Codex ship as ACP agents.
|
||||
|
||||
Today gnoma is reachable only via its own TUI and pipe mode. It cannot
|
||||
plug into an editor's agent panel. Supporting ACP makes gnoma a drop-in
|
||||
agent inside any ACP client, which is a large distribution surface for
|
||||
near-zero ongoing cost — the protocol is stable and gnoma already owns
|
||||
all the hard parts (an agentic engine, tools, permissions, MCP).
|
||||
|
||||
### Why this is a natural fit
|
||||
|
||||
- gnoma already speaks **JSON-RPC over stdio** for MCP
|
||||
(`internal/mcp/jsonrpc.go` `Request`/`Notification`,
|
||||
`internal/mcp/transport*.go`) — that machinery is reusable for the
|
||||
ACP server side (gnoma is the *server* of the JSON-RPC channel here,
|
||||
the mirror of its MCP-client role).
|
||||
- The agentic loop is already factored behind
|
||||
`session.Session` (`internal/session/session.go:54`,
|
||||
`Local.Send`/`SendWithOptions` at `local.go:80-85`) driving
|
||||
`engine.Engine` (`internal/engine/engine.go`). ACP `session/prompt`
|
||||
maps onto one `Send`.
|
||||
- Permissions already route through a pluggable prompt function
|
||||
(`permission.NewChecker(mode, rules, promptFn)`,
|
||||
`cmd/gnoma/main.go:668`). ACP's `session/request_permission` callback
|
||||
is just another `promptFn` implementation.
|
||||
- ACP `session/new` can declare the `mcpServers` the agent should
|
||||
connect to — gnoma already has an MCP manager
|
||||
(`internal/mcp/manager.go`) to honour that in the same handshake.
|
||||
|
||||
### Role decision — both, server first
|
||||
|
||||
Both roles ship under this plan. Sequence them: **agent (server)
|
||||
first** — it's the larger distribution win and exercises the wire
|
||||
protocol end-to-end — then **client**, which reuses the same
|
||||
`internal/acp` protocol/types from the other side. They share the
|
||||
JSON-RPC framing, content-block translation, and capability structs;
|
||||
only the dispatch direction differs.
|
||||
|
||||
The client role is the standardized successor to
|
||||
`internal/provider/subprocess`: that package shells out to CLI agents
|
||||
with one-shot `--output-format stream-json` (or prompt-augmentation
|
||||
fallback), runs the agent's *own* loop with `--yolo`/`--trust`, and
|
||||
cannot surface structured tool calls (it sets `ToolUse:false` for
|
||||
agents lacking stream-json — see TODO "Native agy JSON output"). ACP
|
||||
fixes all of that: a persistent JSON-RPC session, structured
|
||||
`session/update` tool-call events, real permission round-trips, and
|
||||
cancellation.
|
||||
|
||||
### No Go SDK exists
|
||||
|
||||
Official SDKs are TypeScript, Python, Rust, Kotlin — **no Go**. gnoma
|
||||
implements the wire protocol natively against the published JSON
|
||||
schema. Pin the supported `protocolVersion` and the exact method set
|
||||
against the spec at implementation time (the protocol is young and
|
||||
still moving).
|
||||
|
||||
---
|
||||
|
||||
## Non-goals
|
||||
|
||||
- **A full editor UI.** In agent mode gnoma renders nothing; the client
|
||||
owns the UI. gnoma emits `session/update` notifications and the client
|
||||
displays them.
|
||||
- **Replacing the TUI / pipe modes.** ACP agent mode is a third entry
|
||||
mode alongside them, not a replacement.
|
||||
- **Replacing `internal/provider/subprocess` outright.** The ACP-client
|
||||
provider is added alongside it; the stream-json subprocess path stays
|
||||
for agents that don't (yet) speak ACP. Deprecation is a later call.
|
||||
- **Custom transports.** stdio only (the ACP norm: local agent as a
|
||||
subprocess). No socket/HTTP transport.
|
||||
- **gnoma-drives-gnoma over ACP as the default.** gnoma's native
|
||||
providers/router remain the primary path; ACP-client arms are an
|
||||
additional backend source.
|
||||
|
||||
---
|
||||
|
||||
## Design
|
||||
|
||||
The two roles share one package (`internal/acp`): JSON-RPC framing,
|
||||
content-block translation, and the capability/handshake types are
|
||||
direction-agnostic. **Part A** is the agent (server) side; **Part B**
|
||||
is the client side. Build Part A first.
|
||||
|
||||
## Part A — gnoma as ACP agent (server)
|
||||
|
||||
### New entry mode: `gnoma acp`
|
||||
|
||||
Add a third mode beside TUI and pipe (mode is chosen near
|
||||
`cmd/gnoma/main.go:106-114`). Selected by an explicit `acp` subcommand
|
||||
(stdio is shared with the JSON-RPC channel, so it can't be
|
||||
TTY-autodetected the way TUI is). In ACP mode:
|
||||
|
||||
- **No banner, no TUI, no stdout chatter.** stdout/stdin are the
|
||||
JSON-RPC pipe; all human/diagnostic logging goes to **stderr** only
|
||||
(the firewall/audit slog sink must not write to stdout). Audit this
|
||||
carefully — any stray stdout write corrupts the protocol stream.
|
||||
- Reuse the existing session/engine/router/security construction; only
|
||||
the front-end loop differs.
|
||||
|
||||
### Package layout
|
||||
|
||||
```
|
||||
internal/acp/
|
||||
protocol.go // ACP types: handshake, capabilities, content blocks (shared)
|
||||
jsonrpc.go // framing reused/forked from internal/mcp/jsonrpc.go (shared)
|
||||
content.go // ContentBlock <-> message.Message translation (shared)
|
||||
server.go // Part A: stdio JSON-RPC read loop; method dispatch
|
||||
session.go // Part A: ACP session <-> gnoma session.Session bridge
|
||||
permission.go // Part A: session/request_permission promptFn
|
||||
update.go // Part A: gnoma stream events -> session/update
|
||||
client.go // Part B: spawn external agent, drive the handshake/prompt
|
||||
```
|
||||
|
||||
A separate `internal/provider/acp/` holds the **Part B provider**
|
||||
adapter (mirrors `internal/provider/subprocess/`), depending on
|
||||
`internal/acp/client.go`.
|
||||
|
||||
Reuse `internal/mcp/jsonrpc.go` framing if it generalises; otherwise
|
||||
fork the minimal envelope (it's tiny). Keep ACP types separate from MCP
|
||||
types — they are different protocols that happen to share JSON-RPC.
|
||||
|
||||
### Method handlers (agent side)
|
||||
|
||||
Map each ACP method to existing gnoma machinery. Pin exact shapes to the
|
||||
spec; the mapping is the contract:
|
||||
|
||||
| ACP method (client→agent) | gnoma handling |
|
||||
|---|---|
|
||||
| `initialize` | Reply with `agentCapabilities` (tools, MCP support, prompt streaming, permission modes), `agentInfo` (name "gnoma", `buildVersion`). Negotiate `protocolVersion`. |
|
||||
| `session/new` | Build a `session.Local` (router, security, tools wired as in main). Honour `cwd` (run it through `safety.ClassifyCWD`), and connect any `mcpServers` the client declares via `internal/mcp/manager.go`. Return a `sessionId`. |
|
||||
| `session/load` (if advertised) | Rehydrate from `internal/session` store (`SessionStore.Load`). Optional — only if we advertise the capability. |
|
||||
| `session/prompt` | Translate ACP `ContentBlock`s → `message.Message`, call `Send`/`SendWithOptions`, stream results back as `session/update`, return the stop reason. |
|
||||
| `session/cancel` (notification) | Cancel the in-flight turn's context. |
|
||||
|
||||
Agent→client calls gnoma must make:
|
||||
|
||||
| ACP call (agent→client) | Trigger |
|
||||
|---|---|
|
||||
| `session/update` (notification) | Per engine stream event: assistant text deltas, tool-call start/args/result, plan/thoughts, token usage. Map gnoma's stream iterator (`Next/Current`) to update variants. |
|
||||
| `session/request_permission` | gnoma's `permission.Checker` promptFn — instead of console `Scanln`, send this and await the client's allow/deny (with the ACP "allow once / always" options mapped to gnoma permission modes). |
|
||||
| `fs/read_text_file`, `fs/write_text_file` | **If** we advertise client-side fs and the client supports it, route the `fs` tools through the client so edits show in the editor's buffers. Otherwise gnoma's own `internal/tool/fs` operates on disk directly. Decide per capability negotiation. |
|
||||
|
||||
### Streaming bridge
|
||||
|
||||
The engine produces a pull-based stream (`Next() / Current() / Err() /
|
||||
Close()`). The ACP bridge consumes it and emits a `session/update` per
|
||||
event. Backpressure: ACP is fire-and-forget notifications, so no
|
||||
blocking — but coalesce text deltas if the client is slow (config knob,
|
||||
default flush per token).
|
||||
|
||||
### Security & safety interplay
|
||||
|
||||
- The `SafeProvider` firewall boundary and the per-session audit log
|
||||
apply unchanged — ACP is a front-end, providers/tools sit behind the
|
||||
same security layer.
|
||||
- `safety.ClassifyCWD` runs on the `session/new` `cwd`; a `refuse`
|
||||
classification returns an ACP error rather than starting the session.
|
||||
- Egress allowlist (`2026-06-04-egress-allowlist.md`) applies as usual.
|
||||
- Incognito: expose a way to start an ACP session incognito (capability
|
||||
flag or `session/new` param) so editor-driven sessions can be
|
||||
non-persistent.
|
||||
|
||||
### MCP-in-ACP
|
||||
|
||||
When `session/new` lists `mcpServers`, spin them up through the existing
|
||||
manager so the editor's MCP config and gnoma's converge in one
|
||||
handshake (this is the headline ACP×MCP integration). gnoma's own
|
||||
config-level MCP servers still load too; merge, don't replace.
|
||||
|
||||
---
|
||||
|
||||
## Part B — gnoma as ACP client (external agents as router arms)
|
||||
|
||||
gnoma connects to external ACP agents and exposes each as a router-arm
|
||||
backend, the standardized successor to `internal/provider/subprocess`.
|
||||
gnoma plays the *client* (editor) side of the JSON-RPC channel.
|
||||
|
||||
### Provider adapter
|
||||
|
||||
Add `internal/provider/acp/` implementing the `provider.Provider`
|
||||
contract (`Stream`, `Name`, `Models`, `DefaultModel`) — the same surface
|
||||
the subprocess provider satisfies
|
||||
(`internal/provider/subprocess/provider.go:28-62`):
|
||||
|
||||
- **Spawn + handshake.** On first use (or at discovery), spawn the agent
|
||||
subprocess (`exec.CommandContext`, with the Windows/Unix process-group
|
||||
handling from `2026-06-04-cross-platform.md`), send `initialize` as the
|
||||
client, then `session/new` with gnoma's `cwd` and — crucially —
|
||||
gnoma's *own* MCP servers passed through as the `mcpServers` list so
|
||||
the external agent shares gnoma's tool surface.
|
||||
- **`Stream` → `session/prompt`.** Translate the gnoma `Request`
|
||||
messages into ACP `ContentBlock`s, send `session/prompt`, and turn the
|
||||
incoming `session/update` notifications back into gnoma's pull-based
|
||||
stream events (`EventTextDelta`, structured tool-call events, usage).
|
||||
This is the win over the subprocess provider: tool calls arrive
|
||||
**structured**, not as opaque `EventTextDelta` text.
|
||||
- **Permission callbacks.** The external agent sends
|
||||
`session/request_permission` to gnoma (now the client). Route these
|
||||
through gnoma's existing `permission.Checker` so the *user's* gnoma
|
||||
permission policy governs the sub-agent — a strict improvement over
|
||||
today's `--yolo`/`--trust` subprocess invocations that bypass gnoma's
|
||||
gate entirely.
|
||||
- **`fs/*` callbacks.** Route the agent's file reads/writes through
|
||||
gnoma's `internal/tool/fs` guard so the path-safety boundary still
|
||||
applies.
|
||||
- **Cancellation.** gnoma's turn-cancel sends ACP `session/cancel`.
|
||||
|
||||
### Discovery & registration
|
||||
|
||||
Mirror the subprocess flow (`cmd/gnoma/main.go:521-531`):
|
||||
|
||||
- Discover ACP agents from config (`[acp.agents]` — command + args +
|
||||
optional capability hints) and/or a known-agents table analogous to
|
||||
`subprocess/agent.go:60` (`knownAgents`).
|
||||
- Register each as a `router.Arm` (a new `IsACPAgent` flag, or reuse
|
||||
`IsCLIAgent` with a transport discriminant). Set `Capabilities` from
|
||||
the ACP `initialize` response — notably `ToolUse:true`, which the
|
||||
subprocess provider often can't claim.
|
||||
- Wrap in `security.WrapProvider(..., fwRef)` exactly like every other
|
||||
arm so the firewall + audit + egress boundaries hold.
|
||||
|
||||
### Relationship to the subprocess provider
|
||||
|
||||
Additive. Agents that speak ACP (Claude, Gemini CLI, Codex increasingly
|
||||
do) get the ACP arm; agents that only do one-shot stream-json keep the
|
||||
subprocess arm. Where both exist for one binary, prefer ACP. This also
|
||||
unblocks the "Native agy JSON output" backlog item for any agent that
|
||||
exposes ACP instead of `--output-format stream-json`.
|
||||
|
||||
---
|
||||
|
||||
## Touch-points (file:line)
|
||||
|
||||
**Part A — agent (server):**
|
||||
|
||||
| Change | Location |
|
||||
|---|---|
|
||||
| New ACP package | `internal/acp/` |
|
||||
| Entry mode dispatch | `cmd/gnoma/main.go` (mode select ~`:106`, subcommand dispatch ~`:178`) |
|
||||
| stdout→stderr log discipline | logger setup (`main.go:100-114`) |
|
||||
| Session bridge | `internal/session` (`Session`/`Local`) |
|
||||
| Permission callback | `internal/permission` checker promptFn (`main.go:645-668`) |
|
||||
| Stream→update | engine stream iterator (`internal/engine`, `internal/stream`) |
|
||||
| MCP per-session | `internal/mcp/manager.go` |
|
||||
| JSON-RPC framing reuse | `internal/mcp/jsonrpc.go` |
|
||||
|
||||
**Part B — client (external agents as arms):**
|
||||
|
||||
| Change | Location |
|
||||
|---|---|
|
||||
| ACP-client provider | new `internal/provider/acp/` (mirrors `internal/provider/subprocess/`) |
|
||||
| Client handshake/driver | `internal/acp/client.go` |
|
||||
| Arm discovery + registration | `cmd/gnoma/main.go:521-531` (subprocess pattern), `[acp.agents]` config |
|
||||
| Known-agents table | analogous to `internal/provider/subprocess/agent.go:60` |
|
||||
| Arm flag | `router.Arm` (`IsACPAgent`, or `IsCLIAgent` + transport) |
|
||||
| Security wrap | `security.WrapProvider(..., fwRef)` |
|
||||
|
||||
---
|
||||
|
||||
## Testing (TDD — write first)
|
||||
|
||||
- **Protocol unit tests (no real provider):**
|
||||
- `initialize` handshake: version negotiation, advertised
|
||||
capabilities are stable and accurate.
|
||||
- `session/new` → returns a sessionId; honours `cwd`; rejects a
|
||||
`refuse`-classified cwd with an ACP error.
|
||||
- `session/prompt` with a stubProvider: ContentBlocks translate in,
|
||||
`session/update`s stream out in order, correct stop reason.
|
||||
- `session/cancel` aborts the in-flight turn (context cancellation
|
||||
observed).
|
||||
- Permission: a tool call triggers `session/request_permission`; a
|
||||
"deny" response blocks the tool; "allow always" updates the mode.
|
||||
- **stdout purity test:** drive a full prompt and assert stdout
|
||||
contains *only* valid JSON-RPC frames (no banner/log leakage) — this
|
||||
is the most common ACP-agent bug.
|
||||
- **Conformance:** run gnoma against the upstream ACP test client /
|
||||
example client (Rust/TS) in a `//go:build integration` test if one is
|
||||
available; otherwise a recorded-transcript fixture.
|
||||
- **MCP-in-ACP:** `session/new` with an `mcpServers` entry spins the
|
||||
server up and its tools become callable in that session.
|
||||
- **Part B (client) unit tests** — drive a *fake ACP agent* (a small
|
||||
in-process JSON-RPC responder, the mirror of the agent-side tests):
|
||||
- Provider `Stream` performs `initialize`+`session/new`+`session/prompt`
|
||||
and yields gnoma stream events in order, with **structured** tool-call
|
||||
events (not opaque text).
|
||||
- An inbound `session/request_permission` is routed through
|
||||
`permission.Checker` and a deny blocks the call.
|
||||
- An inbound `fs/write_text_file` is mediated by the `internal/tool/fs`
|
||||
guard (a guarded path is refused).
|
||||
- Turn cancel emits `session/cancel`; the subprocess is reaped (tie to
|
||||
cross-platform process-group handling).
|
||||
- Discovery registers a fake ACP agent as an arm with `ToolUse:true`.
|
||||
- **Round-trip (loopback):** point gnoma's ACP-*client* at a `gnoma acp`
|
||||
*server* subprocess and run a prompt end-to-end — exercises both parts
|
||||
over a real stdio pipe.
|
||||
|
||||
### Acceptance criteria
|
||||
|
||||
**Part A (agent/server):**
|
||||
|
||||
1. `gnoma acp` speaks the handshake and a full prompt turn over stdio.
|
||||
2. gnoma appears and works as an external agent in Zed (manual: add
|
||||
gnoma to Zed's external-agents config, run a prompt, approve a tool).
|
||||
3. Tool permission prompts surface in the client and gate execution.
|
||||
4. stdout carries only JSON-RPC; all logs go to stderr.
|
||||
5. Cancelling from the editor stops the turn.
|
||||
6. MCP servers declared by the client in `session/new` are available in
|
||||
that session.
|
||||
|
||||
**Part B (client):**
|
||||
|
||||
7. An external ACP agent configured under `[acp.agents]` appears as a
|
||||
router arm (`gnoma providers` lists it) with `ToolUse:true`.
|
||||
8. Routing a task to that arm runs a full turn via ACP, surfacing the
|
||||
sub-agent's tool calls **structured** in gnoma's stream.
|
||||
9. The sub-agent's permission requests are gated by the user's gnoma
|
||||
permission policy (not auto-approved).
|
||||
10. The sub-agent's file writes pass through gnoma's fs guard.
|
||||
11. Loopback: `gnoma acp` driven by gnoma's own ACP-client completes a
|
||||
prompt end-to-end.
|
||||
|
||||
---
|
||||
|
||||
## Open questions (resolve against the live spec at implementation)
|
||||
|
||||
- Exact `protocolVersion` to target and the precise capability struct
|
||||
shapes (the schema is the source of truth; pin a version).
|
||||
- Whether to advertise client-side `fs/*` (edits flow through the
|
||||
editor's buffers) vs. direct-disk fs tools — depends on parity and on
|
||||
how gnoma's `internal/tool/fs` guard composes with editor-mediated
|
||||
writes.
|
||||
- `session/load` support (needs our session store to round-trip the
|
||||
ACP transcript shape).
|
||||
- **(Part B)** How a sub-agent's own model/cost is represented in the
|
||||
router — an ACP arm's tokens are billed by *that* agent, so
|
||||
`CostWeight`/`CostPer1k*` are opaque. Likely model it like the
|
||||
subprocess arms (no metered cost; selection driven by `Strengths`).
|
||||
- **(Part B)** Lifecycle: spawn-per-session vs. a pooled long-lived
|
||||
agent process reused across turns; how cancellation and crashes are
|
||||
recovered (ties to session error-recovery, `0d3d190`).
|
||||
|
||||
---
|
||||
|
||||
## TODO linkage
|
||||
|
||||
New "Agent Client Protocol (ACP) support" entry in `TODO.md` (In
|
||||
flight) links here. Covers **both** roles: gnoma as ACP agent (Part A)
|
||||
and gnoma as ACP client driving external agents as router arms
|
||||
(Part B). Part B is the standardized successor to
|
||||
`internal/provider/subprocess` and overlaps the "Native agy JSON
|
||||
output" backlog item.
|
||||
@@ -0,0 +1,156 @@
|
||||
# Config Migration — Follow-ups from Phase 1 (2026-06-04)
|
||||
|
||||
Caveats discovered while shipping Phase 1 of
|
||||
[`2026-05-24-config-migration.md`](2026-05-24-config-migration.md) in
|
||||
commit `a9bba42`. The encoder-fix half is in; the issues below are
|
||||
either Phase 2+ of the same plan or adjacent cleanup that's now
|
||||
exposed because the file is being read more carefully than before.
|
||||
|
||||
## Caveat 1 — `Duration` fields still emit zero-spam as raw int64
|
||||
|
||||
**Where:** `internal/config/config.go:50, 57` —
|
||||
`SLM.StartupTimeout Duration` and `SLM.ClassifyTimeout Duration`.
|
||||
|
||||
**Symptom:** Running `gnoma config set --global slm.enabled true`
|
||||
on a fresh global config produces:
|
||||
|
||||
```toml
|
||||
[slm]
|
||||
enabled = true
|
||||
startup_timeout = 0
|
||||
classify_timeout = 0
|
||||
```
|
||||
|
||||
`startup_timeout = 0` and `classify_timeout = 0` are emitted even
|
||||
with `,omitempty` on the struct tags. The `Duration` type only has
|
||||
`UnmarshalText` (`config.go:393`) — no `MarshalText` — so
|
||||
BurntSushi falls back to encoding the underlying `int64` nanosecond
|
||||
value, and `omitempty` doesn't apply to the custom type at the
|
||||
field level.
|
||||
|
||||
**Why it's pre-existing:** The original `setConfig` predates the
|
||||
`omitempty` work in Phase 1. The encoder always wrote the full
|
||||
struct, so the Duration-as-int64 behavior was always there but
|
||||
masked by the surrounding zero-spam from other fields.
|
||||
|
||||
**Severity:** Cosmetic. `0` is the documented "use built-in
|
||||
default" sentinel for both fields — `defaultClassifyTimeout = 15s`
|
||||
in `internal/slm/classifier.go:23` and the llamafile startup
|
||||
timeout defaults to 5s. So the file's `0` values are semantically
|
||||
equivalent to absent; the resolver passes them through unchanged.
|
||||
|
||||
**Fix (small PR, ~30 lines):**
|
||||
|
||||
Convert the two Duration fields to `*Duration` (pointer), matching
|
||||
the seven fields already converted in Phase 1. nil = "use
|
||||
default"; `*Duration(0)` = "explicit zero". The
|
||||
`ResolvedSLMSection` mirror already needs adding in this PR
|
||||
(since the SLM section is currently un-mirrored — Phase 1 only
|
||||
mirrored Provider / Tools / Security / Router / Session / Hooks
|
||||
because those were the sections with pointer-converted fields).
|
||||
|
||||
Implementation steps:
|
||||
|
||||
1. `SLM.StartupTimeout *Duration` and `SLM.ClassifyTimeout *Duration`
|
||||
in `internal/config/config.go`.
|
||||
2. `Defaults()` populates them with the documented defaults
|
||||
(`5s` and `0s` respectively — note the `*Duration(0)` for
|
||||
ClassifyTimeout is intentional: 0 means "let the SLM layer
|
||||
pick its own 15s default", per the existing field comment).
|
||||
3. Add `ResolvedSLMSection` to `internal/config/resolve.go`. Update
|
||||
`ResolvedConfig` to include it. Hook all existing SLM readers
|
||||
(cmd/gnoma/main.go:865-870, 884, 1525, 1554-1561, 1617-1657;
|
||||
internal/tui/app.go:245) through the mirror.
|
||||
4. Test: `TestSetGlobalConfig_DurationFieldOmitsAtZero` — set
|
||||
`slm.enabled = true`, assert the file does NOT contain
|
||||
`startup_timeout` or `classify_timeout`.
|
||||
5. Update `internal/config/config_test.go:454-499` (the three
|
||||
`TestSLMSection_RegisterAsArm_*` tests) to keep working with
|
||||
the new pointer types — they're load-side tests and just need
|
||||
nil-or-deref assertions.
|
||||
|
||||
Risk: low. The SLM section is read in many places, but the
|
||||
`Defaults()` baseline is updated at the same time so the
|
||||
*resolved* values are byte-identical to today's behavior.
|
||||
|
||||
## Caveat 2 — Pre-existing zero-spam is not auto-cleaned
|
||||
|
||||
**Where:** Any user config file that was written by a `gnoma`
|
||||
release predating `a9bba42`. The 2026-05-24 symptom was the
|
||||
project file containing `[router] prefer = ""` after an earlier
|
||||
`gnoma config set ...` call.
|
||||
|
||||
**Phase 1 behavior:** `setConfig` continues to round-trip the
|
||||
file: read existing → decode overlays the struct → apply one
|
||||
change → write back. The `,omitempty` tags mean a field that was
|
||||
*absent* from the source is not emitted. A field that was
|
||||
*present-but-zero* in the source is still re-emitted as zero
|
||||
(the decoder sees it, the encoder writes it back).
|
||||
|
||||
**User's recovery path today:** Re-set the affected key, e.g.
|
||||
`gnoma config set router.prefer cloud`. The decoder reads
|
||||
`prefer = ""` into the struct, the setter overwrites it with
|
||||
`"cloud"`, the encoder writes `prefer = "cloud"`. The zero-spam
|
||||
is gone — for that field, on that file. Other zero-spam in the
|
||||
same file stays until the user re-sets each affected key
|
||||
individually.
|
||||
|
||||
**Why this isn't in Phase 1:** the alternative — "drop fields
|
||||
whose value equals the default" — is a *read-modify-write* of the
|
||||
existing file that needs to know which keys were present in the
|
||||
source. BurntSushi's encoder doesn't expose that; the plan defers
|
||||
it to `gnoma upgrade-config` (Phase 4).
|
||||
|
||||
**Fix (the Phase 4 plan, ~200 lines):** `gnoma upgrade-config`
|
||||
with per-file backup, diff output, and `--all-projects` mode.
|
||||
Out of scope for this follow-up doc; lives in the original
|
||||
[`2026-05-24-config-migration.md` Phase 4 section](2026-05-24-config-migration.md#phase-4--gnoma-upgrade-config).
|
||||
|
||||
**What this caveat doc *does* add:** a one-line README note under
|
||||
the config section flagging that pre-`a9bba42` config files may
|
||||
have accumulated zero-spam, and pointing at `gnoma upgrade-config`
|
||||
as the cleanup tool once it ships.
|
||||
|
||||
## Caveat 3 — `BanditSection` keeps the 0-sentinel pattern
|
||||
|
||||
**Where:** `internal/config/config.go:194-215` — QualityAlpha,
|
||||
MinObservations, ObservedWeight, StrengthBonus.
|
||||
|
||||
**Status:** intentional, kept as-is per the Phase 1 plan. The
|
||||
doc comments on each field document 0 as "use default" and the
|
||||
consumers (`internal/router/feedback.go`, `selector.go`) already
|
||||
handle 0-sentinel values. Pointer conversion would force every
|
||||
reader to deref for a knob that nobody sets by hand.
|
||||
|
||||
**Fix:** none planned. The risk if anyone ever does set these
|
||||
explicitly to 0 (intending "off" or "no effect") is the same
|
||||
silent-shadowing pattern Phase 1 fixed elsewhere — but the
|
||||
comment-documented 0-sentinel is a deliberate contract here.
|
||||
Documented so the next person reviewing the code doesn't try to
|
||||
"fix" it.
|
||||
|
||||
## Ordering and dependencies
|
||||
|
||||
| # | Item | Depends on | Estimated size |
|
||||
|---|---|---|---|
|
||||
| 1 | Duration pointer conversion | nothing | 1 PR, ~30 lines |
|
||||
| 2 | `gnoma upgrade-config` (Phase 4) | nothing | 1 PR, ~200 lines |
|
||||
| 3 | `gnoma doctor` (Phase 3) | Project registry (Phase 2) | 1 PR, ~250 lines |
|
||||
| 4 | Project registry (Phase 2) | nothing | 1 PR, ~150 lines |
|
||||
| 5 | Auto-migration (Phase 5) | Phases 1-4 in production | deferred one release |
|
||||
|
||||
Phase 2 (registry) and Phase 3 (doctor) are independent of the
|
||||
Duration fix and of `upgrade-config`, but doctor without a
|
||||
registry has to fall back to a filesystem scan which is slow on
|
||||
big machines. Land registry first.
|
||||
|
||||
## Not in this doc
|
||||
|
||||
- Sensitive-content policy (separate plan:
|
||||
[`2026-05-24-sensitive-content-policy.md`](2026-05-24-sensitive-content-policy.md))
|
||||
- Egress allowlist (separate plan:
|
||||
[`2026-06-04-egress-allowlist.md`](2026-06-04-egress-allowlist.md))
|
||||
- MiniMax provider (separate plan:
|
||||
[`2026-06-04-minimax-provider.md`](2026-06-04-minimax-provider.md))
|
||||
- ACP (separate plan:
|
||||
[`2026-06-04-agent-client-protocol.md`](2026-06-04-agent-client-protocol.md))
|
||||
@@ -0,0 +1,198 @@
|
||||
# Cross-Platform Support (Windows + macOS) — 2026-06-04
|
||||
|
||||
Makes the Windows and macOS binaries — which GoReleaser already builds
|
||||
for `linux/darwin/windows × amd64/arm64` but only Linux exercises —
|
||||
actually work and stay working. Promotes the TODO.md entry
|
||||
"Cross-platform support — Windows + macOS" into a phased design with
|
||||
concrete code touch-points.
|
||||
|
||||
This plan does not restate the TODO's r/devops question map (Phase 2
|
||||
table there stands). Its value-add is the **specific code locations**
|
||||
that need OS-conditional handling and the build-tag pattern to use.
|
||||
|
||||
---
|
||||
|
||||
## Problem
|
||||
|
||||
Only Linux is tested. The binaries ship for Windows/macOS untested, and
|
||||
the codebase has several hard Unix assumptions that will fail or
|
||||
silently misbehave off-Linux. The pattern to follow already exists:
|
||||
`internal/mcp/transport_{unix,windows}.go` split via build tags.
|
||||
|
||||
---
|
||||
|
||||
## Non-goals
|
||||
|
||||
- **MSI installer, Authenticode/Gatekeeper signing.** Covered by
|
||||
`2026-06-04-distribution-followups.md` — those are packaging, not
|
||||
runtime correctness.
|
||||
- **Group Policy / Event Viewer integration.** Out of scope per the
|
||||
TODO; documentation-only.
|
||||
- **WSL-specific tuning.** WSL is Linux; it works today.
|
||||
|
||||
---
|
||||
|
||||
## Confirmed Unix-assumption defects (file:line)
|
||||
|
||||
### Critical — break core functionality on Windows
|
||||
|
||||
1. **Bash tool hardcodes `bash -c`.**
|
||||
`internal/tool/bash/bash.go:117` →
|
||||
`exec.CommandContext(ctx, "bash", "-c", command)`. No Windows shell.
|
||||
Alias harvesting (`internal/tool/bash/aliases.go:115,148`) hardcodes
|
||||
`/bin/bash` and splits the shell path on `/`.
|
||||
2. **Llamafile SLM startup hardcodes `sh`.**
|
||||
`internal/slm/manager.go:172` invokes `sh <llamafile>` (a Wine
|
||||
binfmt workaround). `sh` is absent on native Windows → `gnoma slm
|
||||
status/setup` fails outright.
|
||||
3. **MCP process-tree kill is a Windows stub.**
|
||||
`internal/mcp/transport_windows.go:10-18` — `setProcessGroup` is a
|
||||
no-op and `killProcessTree` calls `p.Kill()`, leaking any child
|
||||
processes an MCP server spawns. Unix version uses process groups
|
||||
(`transport_unix.go:11-18`).
|
||||
|
||||
### High — config/auth land in the wrong place off-Linux
|
||||
|
||||
4. **Config/data dirs assume XDG.**
|
||||
`internal/config/load.go:52-59` falls back to `~/.config`;
|
||||
`internal/slm/manager.go:25-35` falls back to `~/.local/share`. On
|
||||
Windows these should be `os.UserConfigDir()` (`%AppData%`) /
|
||||
`os.UserCacheDir()`. On macOS, native tools use
|
||||
`~/Library/Application Support`, though `~/.config` is tolerable;
|
||||
decide and document.
|
||||
5. **OAuth credential discovery is Unix-pathed.**
|
||||
`internal/provider/google/provider.go:188-204` hardcodes
|
||||
`~/.config/...` and `~/.gemini/...`. `expandHome` (`:114-129`)
|
||||
already handles `\`, but the path *set* is Unix-centric — Gemini/
|
||||
Antigravity creds on macOS/Windows won't be found.
|
||||
6. **No system-proxy support.** No `http.ProxyFromEnvironment` wiring
|
||||
found. Go stdlib reads `HTTP(S)_PROXY` env vars but **not** the
|
||||
Windows system proxy / PAC. Corporate Windows networks rely on these.
|
||||
|
||||
### Medium — usability / safety classifier gaps
|
||||
|
||||
7. **`internal/safety/cwd.go`** macOS system roots
|
||||
(`:185-210`) miss `/opt`, `/usr/local`; personal-dir detection
|
||||
(`:221-252`) misses Windows `%TEMP%`/`%APPDATA%` and macOS
|
||||
`~/Library/...`.
|
||||
8. **Terminal/ANSI.** TUI uses lipgloss/termenv (auto-detects), so
|
||||
modern Windows Terminal/PowerShell 7 are fine; legacy `conhost.exe`
|
||||
may mangle. Verify, don't assume.
|
||||
|
||||
---
|
||||
|
||||
## Design
|
||||
|
||||
### Phase 0 — build-tag scaffolding
|
||||
|
||||
Adopt the existing `_unix.go` / `_windows.go` split (as in
|
||||
`internal/mcp`) for each defect that needs divergent behaviour. Prefer
|
||||
`runtime.GOOS` only for small inline branches (as
|
||||
`internal/safety/cwd.go:201` already does); use build tags when the
|
||||
implementation genuinely differs (shell selection, process kill).
|
||||
|
||||
### Phase 1 — smoke tests (unblocks the honest "did you test it?" answer)
|
||||
|
||||
Non-blocking GitHub Actions matrix (`windows-latest`, `macos-latest`,
|
||||
`ubuntu-latest`):
|
||||
|
||||
- `go build ./...` and `go test ./...` per OS (today the release
|
||||
workflow tests Linux only — `.github/workflows/release.yml`).
|
||||
- Post-release: download each archive, run `gnoma --version` and a
|
||||
stubbed `echo hi | gnoma --provider ollama` against a fake endpoint.
|
||||
Confirms the binary launches and the TUI doesn't crash.
|
||||
|
||||
This is the precondition the TODO names for posting to r/devops.
|
||||
|
||||
### Phase 2 — shell abstraction (defects #1, #2)
|
||||
|
||||
1. Introduce `internal/tool/bash/shell_unix.go` /
|
||||
`shell_windows.go` exposing `defaultShell() (name string, args
|
||||
[]string)` and a `quoteArg(string) string`:
|
||||
- Unix: `bash`/`$SHELL`, `-c`, POSIX quoting.
|
||||
- Windows: prefer `pwsh`/`powershell` with the appropriate
|
||||
`-Command` invocation and PowerShell quoting rules; fall back to
|
||||
`cmd /c`. Document the choice.
|
||||
2. Fix `aliases.go` to use `filepath.Base` instead of splitting on `/`,
|
||||
and skip alias harvesting on Windows shells that have no equivalent.
|
||||
3. Llamafile: on Windows, invoke the `.llamafile` (which is a valid
|
||||
Windows PE as well as a shell script) directly rather than via `sh`;
|
||||
guard with a build tag.
|
||||
|
||||
### Phase 3 — process management (defect #3)
|
||||
|
||||
Implement Windows job objects via `golang.org/x/sys/windows` in
|
||||
`transport_windows.go` (and any other subprocess owner —
|
||||
`internal/provider/subprocess`, `internal/tool/bash`): create a job,
|
||||
assign the child, `TerminateJobObject` on close to reap the whole tree.
|
||||
Shared helper so MCP and bash tool both get tree-kill. (This is the
|
||||
same item the distribution TODO references.)
|
||||
|
||||
### Phase 4 — paths + proxy (defects #4, #5, #6)
|
||||
|
||||
1. Replace XDG fallbacks with `os.UserConfigDir()` / `os.UserCacheDir()`
|
||||
on Windows (keep XDG honoring on Unix). Centralise in one
|
||||
`configDir()` / `dataDir()` helper so it's not re-derived.
|
||||
2. Extend the OAuth credential path sets with OS-appropriate locations
|
||||
(macOS `~/Library/Application Support/...`, Windows `%AppData%/...`).
|
||||
3. Ensure every `http.Client` uses a transport with
|
||||
`Proxy: http.ProxyFromEnvironment`. For Windows system-proxy/PAC,
|
||||
document the env-var workaround now; optionally vendor a PAC-aware
|
||||
transport (e.g. `github.com/rapid7/go-get-proxied`) later. This
|
||||
overlaps the shared-client work in
|
||||
`2026-06-04-egress-allowlist.md` — do the proxy transport once, in
|
||||
the shared client.
|
||||
|
||||
### Phase 5 — safety classifier + terminal (defects #7, #8)
|
||||
|
||||
Extend `internal/safety/cwd.go` system-root and personal-dir sets per
|
||||
OS; add a manual verification note for legacy Windows terminals.
|
||||
|
||||
---
|
||||
|
||||
## Touch-points (file:line)
|
||||
|
||||
| Defect | Location |
|
||||
|---|---|
|
||||
| Bash shell | `internal/tool/bash/bash.go:117`, `aliases.go:115,148` |
|
||||
| Llamafile `sh` | `internal/slm/manager.go:172` |
|
||||
| MCP kill stub | `internal/mcp/transport_windows.go:10-18` |
|
||||
| Config/data dirs | `internal/config/load.go:52-59`, `internal/slm/manager.go:25-35` |
|
||||
| OAuth paths | `internal/provider/google/provider.go:188-204` |
|
||||
| Proxy | shared `http.Client` (see egress plan) |
|
||||
| Safety classifier | `internal/safety/cwd.go:185-252` |
|
||||
| CI matrix | `.github/workflows/` (new test job), `release.yml` |
|
||||
|
||||
---
|
||||
|
||||
## Testing (TDD — write first)
|
||||
|
||||
- **OS-gated unit tests** (run on each matrix OS):
|
||||
- `defaultShell()` returns a runnable shell per OS; `quoteArg`
|
||||
round-trips a value containing spaces/quotes through the real shell.
|
||||
- `configDir()`/`dataDir()` return the OS-correct base.
|
||||
- Job-object kill: spawn a child that spawns a grandchild; assert
|
||||
both are gone after `killProcessTree` (Windows).
|
||||
- `safety.ClassifyCWD` flags OS-appropriate system/personal dirs.
|
||||
- **Existing tests** that `t.Skip` on Windows
|
||||
(`internal/tool/fs/guard_test.go`,
|
||||
`internal/provider/subprocess/stream_test.go`) — audit whether the
|
||||
skip hides a real gap now that Windows is a target.
|
||||
|
||||
### Acceptance criteria
|
||||
|
||||
1. CI smoke matrix is green on `windows-latest` + `macos-latest`.
|
||||
2. `gnoma --version` and a stubbed pipe run succeed on a Windows runner.
|
||||
3. A bash-tool command with quoted args runs on Windows (PowerShell).
|
||||
4. An MCP server that spawns a child leaves no orphan after shutdown on
|
||||
Windows.
|
||||
5. Config lands in `%AppData%\gnoma` on Windows, `~/.config/gnoma` on
|
||||
Linux.
|
||||
|
||||
---
|
||||
|
||||
## TODO linkage
|
||||
|
||||
Promotes the "Cross-platform support — Windows + macOS" entry in
|
||||
`TODO.md`. The Phase-2 r/devops question table stays in the TODO as the
|
||||
public-facing answer map; link this plan for the implementation detail.
|
||||
@@ -0,0 +1,169 @@
|
||||
# Distribution Follow-ups — 2026-06-04
|
||||
|
||||
Hardens and broadens the release pipeline. v0.1.0+ already ships static
|
||||
archives (GitHub mirror releases) and multi-arch Docker images (GHCR)
|
||||
via GoReleaser. This plan covers the optional follow-ups listed under
|
||||
"Distribution — follow-ups" in TODO.md: signed checksums, Homebrew tap,
|
||||
`curl | sh` installer, release-note automation, and the
|
||||
`dockers`→`dockers_v2` migration.
|
||||
|
||||
---
|
||||
|
||||
## Current state (confirmed)
|
||||
|
||||
- **`.goreleaser.yml`:** 6-target build matrix (linux/darwin/windows ×
|
||||
amd64/arm64), CGO disabled, version injected via ldflags
|
||||
(`-X main.buildVersion/buildCommit/buildDate`; read at
|
||||
`cmd/gnoma/main.go:55-60`, printed at `:95-98`). Archives: tar.gz
|
||||
(zip on Windows). Checksums: plain SHA256 `checksums.txt`,
|
||||
**unsigned**. Docker: separate per-arch `dockers` blocks +
|
||||
`docker_manifests` for the multi-arch manifest. Release published to
|
||||
GitHub mirror (`release.github` owner `VikingOwl91`).
|
||||
- **`.github/workflows/release.yml`:** triggers on `v*` tags, sets up
|
||||
QEMU + Buildx, logs into GHCR with the built-in `GITHUB_TOKEN`, runs
|
||||
`go test ./...` (Linux only), then `goreleaser release --clean` with
|
||||
`GORELEASER_CURRENT_TAG` set. **No signing step.**
|
||||
- **`Dockerfile`:** distroless `static:nonroot`, copies the
|
||||
GoReleaser-built binary in. Architecture-agnostic (binary built
|
||||
before `COPY`).
|
||||
- **No** Homebrew tap, install script, or Makefile release target.
|
||||
|
||||
---
|
||||
|
||||
## Non-goals
|
||||
|
||||
- **Authenticode (Windows) / Gatekeeper notarization (macOS) code
|
||||
signing.** These need a paid EV cert / Apple Developer account —
|
||||
tracked separately (the cross-platform TODO documents the
|
||||
"right-click → Unblock" workaround). Sigstore/cosign here is for
|
||||
*checksum* signing, which needs no paid cert.
|
||||
- **MSI installer.** Lives in the cross-platform plan, gated on demand.
|
||||
- **Changing the canonical repo flow.** PRs still go to the Gitea
|
||||
upstream; the GitHub mirror remains the release/CI surface.
|
||||
|
||||
---
|
||||
|
||||
## Design (independent work items — ship in any order)
|
||||
|
||||
### 1. Signed checksums (cosign / sigstore keyless)
|
||||
|
||||
Add a GoReleaser `signs` block that signs `checksums.txt` with cosign
|
||||
in **keyless** mode (OIDC via the GitHub Actions token — no stored
|
||||
private key, no cert cost):
|
||||
|
||||
- Add `cosign` install + `id-token: write` permission to
|
||||
`release.yml`.
|
||||
- GoReleaser `signs:` → `cmd: cosign`, `args: sign-blob` producing
|
||||
`checksums.txt.sig` + `.pem` (cert bundle) as release artifacts.
|
||||
- Document verification:
|
||||
`cosign verify-blob --certificate ... --signature ... checksums.txt`.
|
||||
|
||||
Acceptance: a downloaded release verifies offline against the published
|
||||
signature + Rekor transparency log.
|
||||
|
||||
### 2. Homebrew tap
|
||||
|
||||
Create a tap repo (`VikingOwl91/homebrew-tap`) and add GoReleaser's
|
||||
`brews:` block targeting it. Needs a PAT with `contents:write` on the
|
||||
tap repo (the default `GITHUB_TOKEN` can't push to a *second* repo) —
|
||||
store as `HOMEBREW_TAP_TOKEN` secret. Formula installs the darwin/linux
|
||||
archives.
|
||||
|
||||
Acceptance: `brew install vikingowl91/tap/gnoma` installs a working
|
||||
binary on macOS + Linuxbrew; `gnoma --version` matches the tag.
|
||||
|
||||
### 3. `curl | sh` installer
|
||||
|
||||
Add `install.sh` (committed at repo root, served via the raw GitHub
|
||||
mirror) that:
|
||||
|
||||
- Detects OS/arch, maps to the GoReleaser archive name template
|
||||
(`gnoma_<ver>_<os>_<arch>.<ext>`).
|
||||
- Resolves the latest release via the GitHub API (or honours a pinned
|
||||
`GNOMA_VERSION`).
|
||||
- Downloads the archive **and** `checksums.txt`, verifies the SHA256
|
||||
before extracting (and the cosign signature if cosign is present).
|
||||
- Installs to `~/.local/bin` (or `$GNOMA_INSTALL_DIR`), prints a PATH
|
||||
hint.
|
||||
|
||||
Keep it POSIX-sh, no bashisms. Acceptance:
|
||||
`curl -fsSL <raw>/install.sh | sh` yields a runnable `gnoma` on a clean
|
||||
Linux + macOS box; checksum mismatch aborts.
|
||||
|
||||
### 4. Release-note automation
|
||||
|
||||
GoReleaser already generates a filtered changelog (excludes
|
||||
docs/test/chore/style). Enrich it:
|
||||
|
||||
- Group commits by Conventional-Commit type
|
||||
(`changelog.groups` with title regexes for feat/fix/perf/refactor).
|
||||
- Add a release header template pointing to the upstream Gitea repo and
|
||||
the install methods (brew / curl | sh / docker).
|
||||
|
||||
Acceptance: a tagged release's GitHub notes show grouped sections + an
|
||||
install snippet, with no docs/chore noise.
|
||||
|
||||
### 5. `dockers` → `dockers_v2` migration
|
||||
|
||||
Collapse the two per-arch `dockers` blocks + `docker_manifests` into a
|
||||
single `dockers_v2` block (GoReleaser's newer multi-platform builder).
|
||||
The current `Dockerfile` is architecture-agnostic (binary copied
|
||||
post-build), so verify whether `dockers_v2`'s expected per-platform
|
||||
binary layout needs a `Dockerfile` change or a `templates`/`extra_files`
|
||||
tweak — the TODO flags this as the reason it was deferred. Do it in its
|
||||
own commit; diff the resulting GHCR manifest against the current one to
|
||||
prove parity (same tags: `<ver>-amd64`, `<ver>-arm64`, `<ver>`,
|
||||
`latest`).
|
||||
|
||||
Acceptance: GHCR still publishes a multi-arch manifest with identical
|
||||
tags + labels; `docker pull --platform linux/arm64` works.
|
||||
|
||||
### 6. (Carry-over) Windows process-tree kill
|
||||
|
||||
Listed in this TODO bullet but it's a *runtime* concern — implemented in
|
||||
`2026-06-04-cross-platform.md` Phase 3 (job objects). Cross-linked here
|
||||
only so the TODO bullet's reference resolves.
|
||||
|
||||
---
|
||||
|
||||
## Touch-points (file:line)
|
||||
|
||||
| Item | Location |
|
||||
|---|---|
|
||||
| Signing, brews, changelog groups, dockers_v2 | `.goreleaser.yml` |
|
||||
| cosign install, `id-token` perm, tap token | `.github/workflows/release.yml` |
|
||||
| Installer | new `install.sh` (repo root) |
|
||||
| Dockerfile (if dockers_v2 needs it) | `Dockerfile` |
|
||||
| Tap repo | new `VikingOwl91/homebrew-tap` |
|
||||
|
||||
---
|
||||
|
||||
## Testing
|
||||
|
||||
Distribution is config + scripts, so testing is mostly pipeline-level:
|
||||
|
||||
- **Dry run:** `goreleaser release --snapshot --clean` locally must
|
||||
produce signed checksums, brew formula, and the dockers_v2 manifest
|
||||
without publishing.
|
||||
- **install.sh:** a `shellcheck` gate + a CI job that runs it against
|
||||
the latest release on linux + macos runners and asserts
|
||||
`gnoma --version`.
|
||||
- **Checksum/signature negative test:** corrupt the archive → installer
|
||||
aborts; tampered checksums → cosign verify fails.
|
||||
|
||||
### Acceptance criteria
|
||||
|
||||
1. A tagged release publishes `checksums.txt` + `.sig` + `.pem`,
|
||||
verifiable with cosign keyless.
|
||||
2. `brew install vikingowl91/tap/gnoma` works on macOS.
|
||||
3. `curl -fsSL <raw>/install.sh | sh` works on clean Linux + macOS,
|
||||
with checksum verification.
|
||||
4. Release notes are grouped and carry install instructions.
|
||||
5. GHCR multi-arch manifest is unchanged after the dockers_v2 swap.
|
||||
|
||||
---
|
||||
|
||||
## TODO linkage
|
||||
|
||||
Promotes the "Distribution — follow-ups" entry in `TODO.md`. Link this
|
||||
file; the Windows job-object sub-item points at the cross-platform plan.
|
||||
@@ -0,0 +1,236 @@
|
||||
# Network Egress Allowlist — 2026-06-04
|
||||
|
||||
Adds a per-host network egress boundary to the security layer via a
|
||||
Learn → Review → Enforce rollout. Promotes the second half of the
|
||||
TODO.md entry "Security boundary — egress controls + session audit log"
|
||||
into a phased design.
|
||||
|
||||
---
|
||||
|
||||
## Status of the sibling item: per-session audit log — DONE
|
||||
|
||||
The first half of the TODO entry (per-session audit log of
|
||||
blocked/redacted events) is **already implemented**:
|
||||
|
||||
- `internal/security/audit.go` defines `AuditLogger` / `AuditEvent`,
|
||||
writing append-only JSONL at mode `0o600`, incognito-gated,
|
||||
best-effort (write failures never break the scan pipeline).
|
||||
- `cmd/gnoma/main.go:685-691` wires it to
|
||||
`<projectRoot>/.gnoma/sessions/<sessionID>/audit.jsonl`.
|
||||
- `internal/security/firewall.go` records events at `:152` (unicode
|
||||
sanitize), `:173` (block), `:186` (redact).
|
||||
|
||||
**Remaining audit-log gap:** there is no CLI surface to *read* it. The
|
||||
TODO's promise — answer "what did the firewall do this session?" in one
|
||||
command — needs a `gnoma firewall audit` subcommand (no `firewall`
|
||||
subcommand exists today; top-level commands are `providers`, `slm`,
|
||||
`router`, `profile`). That viewer is folded into Phase 3 below since it
|
||||
shares the `gnoma firewall` command surface with `firewall review`.
|
||||
|
||||
The rest of this plan is the genuinely-unbuilt egress allowlist.
|
||||
|
||||
---
|
||||
|
||||
## Problem
|
||||
|
||||
The current `Firewall` is a **content** boundary only: it scans
|
||||
messages and tool results for secrets (regex + Shannon entropy) and
|
||||
redacts/blocks/warns. It does **not** enforce network egress. Outgoing
|
||||
HTTP uses stock clients with no per-host allowlist and no dial-layer
|
||||
interception, so a compromised tool, MCP server, or prompt-injected
|
||||
provider call can reach any host.
|
||||
|
||||
The README and v0.3.0 launch post oversold "network egress gated";
|
||||
this plan makes that claim true.
|
||||
|
||||
### Why this is hard: no egress chokepoint today
|
||||
|
||||
Outgoing HTTP is constructed in many places, none sharing a client:
|
||||
|
||||
- **Provider SDKs** each build their own `http.Client` internally:
|
||||
- anthropic (`internal/provider/anthropic/provider.go:36`,
|
||||
`anthropic.NewClient`)
|
||||
- openai (`internal/provider/openai/provider.go:46`, `oai.NewClient`)
|
||||
- mistral (`internal/provider/mistral/provider.go:33`,
|
||||
`mistralgo.NewClient`)
|
||||
- google genai (`internal/provider/google/provider.go:239,306`)
|
||||
- **Non-SDK direct calls** using `http.DefaultClient` or ad-hoc
|
||||
`&http.Client{}`:
|
||||
- `internal/router/discovery.go` (`:65,141,325,365`)
|
||||
- `internal/router/probe.go` (`:24,72`)
|
||||
- `internal/slm/backend.go` (`:266,294,316,343`)
|
||||
- `internal/slm/download.go` (`:22`)
|
||||
- `internal/slm/manager.go` (`:273`)
|
||||
|
||||
No custom `http.Client` is injected anywhere today. **But** every SDK
|
||||
supports injecting one, which is the enabler for a single chokepoint.
|
||||
|
||||
---
|
||||
|
||||
## Non-goals
|
||||
|
||||
- **TLS interception / MITM.** We allowlist by destination host, not by
|
||||
inspecting decrypted payloads. Content inspection stays the
|
||||
firewall's job.
|
||||
- **Blocking the provider SDKs' own retry/telemetry hosts by default.**
|
||||
Model-provider hosts are baseline-allowed (see below).
|
||||
- **Replacing the OS/network firewall.** This is an in-process
|
||||
application-level guard, defense-in-depth, not a substitute for real
|
||||
network controls. Document this honestly (the README over-claim is
|
||||
the cautionary tale).
|
||||
|
||||
---
|
||||
|
||||
## Design
|
||||
|
||||
### The chokepoint: one shared `http.Client` with a guarded dialer
|
||||
|
||||
Build a single `*http.Client` whose `Transport.DialContext` validates
|
||||
the destination against the allowlist **before** the connection is
|
||||
made. `DialContext` receives `host:port` pre-resolution, so host-based
|
||||
matching works without DNS races. Thread this client everywhere.
|
||||
|
||||
```
|
||||
internal/security/egress/
|
||||
guard.go // EgressGuard: mode + allowlist + Decide(host) ResultEnum
|
||||
dialer.go // GuardedDialer wrapping net.Dialer.DialContext
|
||||
client.go // HTTPClient(guard) *http.Client
|
||||
store.go // learned-destinations persistence (per project)
|
||||
baseline.go // curated ship-in-binary allowlist
|
||||
```
|
||||
|
||||
**Injection mechanism per SDK** (each differs — enumerate, don't assume):
|
||||
|
||||
| Client | Mechanism |
|
||||
|---|---|
|
||||
| anthropic | `option.WithHTTPClient(c)` appended in `anthropic/provider.go` |
|
||||
| openai | `option.WithHTTPClient(c)` appended in `openai/provider.go` |
|
||||
| google genai | `genai.ClientConfig{HTTPClient: c}` in `google/provider.go` |
|
||||
| mistral | **user's own SDK** — add `WithHTTPClient` option if absent (`github.com/VikingOwl91/mistral-go-sdk`), then use it |
|
||||
| non-SDK paths | replace `http.DefaultClient` with the shared client in `router/discovery.go`, `router/probe.go`, `slm/backend.go`, `slm/download.go`, `slm/manager.go` |
|
||||
|
||||
Plumb the shared client into providers by adding
|
||||
`HTTPClient *http.Client` to `provider.ProviderConfig`
|
||||
(`internal/provider/registry.go:8-16`) and setting it in
|
||||
`createProvider`. The non-SDK paths take the client via their existing
|
||||
constructors / a package-level setter.
|
||||
|
||||
> The non-SDK paths are the trap: if any is missed it punches a hole in
|
||||
> the allowlist. Treat the list above as a checklist; add a grep test
|
||||
> (Phase 4) that fails if `http.DefaultClient` reappears.
|
||||
|
||||
### Three-stage rollout (not a single "block everything" default)
|
||||
|
||||
**Learn.** First runs log every egress destination per `(project,
|
||||
agent, tool)` tuple to the per-project store **without blocking**.
|
||||
Reuse the audit JSONL discipline (atomic, incognito-gated).
|
||||
|
||||
**Review.** `gnoma firewall review` surfaces the captured set; the user
|
||||
marks each destination `allow | deny | scoped` (scoped = only reachable
|
||||
by named tool/agent). Persist to `.gnoma/firewall/allowlist.toml`
|
||||
(project) — subject to the same `omitempty`/atomic-write discipline as
|
||||
the config-migration plan (`2026-05-24-config-migration.md`) to avoid
|
||||
the zero-spam corruption class.
|
||||
|
||||
**Enforce.** When mode is `enforce`, unrecognised destinations are
|
||||
blocked with a clear violation logged to the **same per-session
|
||||
`audit.jsonl`** (new `AuditEvent.Action = "egress_block"`). Mode is
|
||||
`[security.egress].mode = "off" | "learn" | "enforce"`, default `off`
|
||||
(opt-in; shipping `enforce` on by default would break first-run UX).
|
||||
|
||||
### Baseline allowlist (curated, ship-in-binary)
|
||||
|
||||
`baseline.go` seeds the allowlist so Enforce mode is usable immediately:
|
||||
|
||||
- **Package ecosystems:** github.com, registry.npmjs.org, pypi.org,
|
||||
files.pythonhosted.org, crates.io, static.crates.io,
|
||||
registry-1.docker.io, proxy.golang.org, sum.golang.org.
|
||||
- **Model providers:** anthropic, openai, google, mistral, **minimax**
|
||||
(per `2026-06-04-minimax-provider.md`) — host set derived from the
|
||||
effective `[provider.endpoints]` map so user-configured local
|
||||
ollama/llamacpp endpoints are auto-allowed.
|
||||
|
||||
The painful middle ground is SDK egress (sentry, stripe, supabase,
|
||||
datadog…). These break a naive "block unknown" default, which is
|
||||
exactly why Learn → Review → Enforce is the only flow that scales.
|
||||
|
||||
### Per-tool scoping
|
||||
|
||||
`scoped` destinations carry an allowed-tool/agent set. Enforcement
|
||||
checks the calling context — the engine already knows which tool is
|
||||
running (it threads per-tool context for redaction logging today). Pass
|
||||
the tool/agent identity into `EgressGuard.Decide(host, callerCtx)`.
|
||||
|
||||
---
|
||||
|
||||
## Interactions
|
||||
|
||||
- **Incognito:** Learn-mode writes are gated by incognito exactly like
|
||||
the audit log (`IncognitoMode.ShouldLogContent`). Enforcement still
|
||||
applies in incognito (security is not relaxed); only the *learning*
|
||||
persistence is suppressed.
|
||||
- **Config layering:** the allowlist file is a new corruption surface —
|
||||
follow `2026-05-24-config-migration.md` #1 discipline.
|
||||
- **SafeProvider:** egress is orthogonal to the content `SafeProvider`
|
||||
wrap; it lives one layer down at the transport. Both must hold.
|
||||
|
||||
---
|
||||
|
||||
## Touch-points (file:line)
|
||||
|
||||
| Change | Location |
|
||||
|---|---|
|
||||
| New egress package | `internal/security/egress/` |
|
||||
| `HTTPClient` field | `internal/provider/registry.go:8-16` |
|
||||
| Provider client injection | `anthropic/provider.go`, `openai/provider.go`, `google/provider.go`, `mistral/provider.go` |
|
||||
| mistral SDK `WithHTTPClient` | `github.com/VikingOwl91/mistral-go-sdk` (if absent) |
|
||||
| Non-SDK client swap | `router/discovery.go`, `router/probe.go`, `slm/backend.go`, `slm/download.go`, `slm/manager.go` |
|
||||
| `audit.go` egress action | `internal/security/audit.go` (`AuditEvent`) |
|
||||
| Config `[security.egress]` | `internal/config/config.go` (SecuritySection ~`:280-306`) |
|
||||
| `gnoma firewall` command | `cmd/gnoma/main.go` subcommand dispatch (~`:178`) |
|
||||
| Allowlist store | `.gnoma/firewall/allowlist.toml` |
|
||||
|
||||
---
|
||||
|
||||
## Testing (TDD — write first)
|
||||
|
||||
- **Unit:**
|
||||
- `EgressGuard.Decide`: off → always allow; learn → allow + record;
|
||||
enforce → allow baseline/allowlisted, block unknown, scoped host
|
||||
allowed only for the named tool.
|
||||
- `GuardedDialer` blocks a non-allowlisted `host:port` before dial
|
||||
(use a guard with a closed allowlist; assert no connection
|
||||
attempt — inject a fake inner dialer that records calls).
|
||||
- Baseline expansion: `[provider.endpoints]` hosts are auto-allowed;
|
||||
a local ollama URL becomes an allowlist entry.
|
||||
- Allowlist store round-trips without zero-spam corruption.
|
||||
- `audit.jsonl` gains an `egress_block` record on a blocked dial.
|
||||
- **Grep/guard test:** fails if `http.DefaultClient` is used in
|
||||
provider/router/slm packages (prevents regressions reopening the
|
||||
hole).
|
||||
- **Integration (`//go:build integration`):** with mode=enforce and a
|
||||
minimal allowlist, a provider call to an allowed host succeeds and a
|
||||
tool fetch to a blocked host fails with a logged violation.
|
||||
|
||||
### Acceptance criteria
|
||||
|
||||
1. `mode="off"` (default) → behaviour identical to today.
|
||||
2. `mode="learn"` → every outbound host appears in the store; nothing
|
||||
is blocked.
|
||||
3. `gnoma firewall review` lists learned hosts and persists
|
||||
allow/deny/scoped decisions.
|
||||
4. `mode="enforce"` → baseline + allowlisted hosts reachable; an
|
||||
un-allowlisted host is blocked with an `egress_block` line in
|
||||
`.gnoma/sessions/<id>/audit.jsonl`.
|
||||
5. `gnoma firewall audit` prints this session's firewall events
|
||||
(block/redact/egress) in a grep-friendly form. (Closes the
|
||||
remaining audit-log gap.)
|
||||
6. Scoped destination reachable by its named tool only.
|
||||
|
||||
---
|
||||
|
||||
## TODO linkage
|
||||
|
||||
Replaces the egress half of the "Security boundary — egress controls +
|
||||
session audit log" entry in `TODO.md`. Update that entry to mark the
|
||||
audit log implemented and link this file for the egress work.
|
||||
@@ -0,0 +1,224 @@
|
||||
# MiniMax Provider — 2026-06-04
|
||||
|
||||
Adds MiniMax (<https://platform.minimax.io>) as a first-class cloud
|
||||
provider so it can register as a router arm alongside
|
||||
anthropic/openai/google/mistral. Promotes the TODO.md entry
|
||||
"MiniMax provider — cloud arm + subscription token plan" out of
|
||||
bullet form into a phased design.
|
||||
|
||||
---
|
||||
|
||||
## Problem
|
||||
|
||||
Gnoma has no MiniMax adapter. MiniMax ships strong, very cheap coding
|
||||
models (M2 family) that are a natural fit for the cheap-high-capability
|
||||
cloud tier the router already reasons about via `CostWeight`. Two facts
|
||||
make the integration cheap:
|
||||
|
||||
1. MiniMax exposes **both** an OpenAI-compatible and an
|
||||
Anthropic-compatible HTTP surface, so no new translation layer is
|
||||
needed — gnoma already has both `internal/provider/openaicompat`
|
||||
(built on the OpenAI SDK) and `internal/provider/anthropic` with a
|
||||
working `BaseURL` override.
|
||||
2. `envKeyFor`'s default branch (`cmd/gnoma/main.go:1199-1200`) already
|
||||
resolves `MINIMAX_API_KEY` for any unknown provider with no code
|
||||
change.
|
||||
|
||||
The remaining work is wiring (a constructor + switch cases +
|
||||
enumerations), routing metadata (family defaults, rate limits), and a
|
||||
**design decision around the subscription billing model** that the
|
||||
router's metered-cost assumption does not currently handle.
|
||||
|
||||
### External facts (VERIFY at implementation — MiniMax docs move fast)
|
||||
|
||||
These were confirmed 2026-06-04 but the model lineup and pricing are
|
||||
revised frequently (a pricing overhaul landed 2026-06-02). Re-verify
|
||||
against the live docs before hardcoding anything:
|
||||
|
||||
- **OpenAI-compatible base URL:** `https://api.minimax.io/v1`
|
||||
(international). A separate region endpoint exists
|
||||
(`api.minimaxi.com`); confirm the exact host + whether gnoma should
|
||||
expose a region toggle. Docs:
|
||||
<https://platform.minimax.io/docs/api-reference/text-openai-api>
|
||||
- **Anthropic-compatible endpoint:** exists ("two equivalent
|
||||
endpoints, one mimics OpenAI, one mimics Anthropic"). Confirm the
|
||||
exact path/host before choosing it over OpenAI-compat.
|
||||
- **Models (do NOT hardcode a single ID):** MiniMax-M2, M2.1, M2.5,
|
||||
M2.7 (+ `-highspeed` variants), M3. Coding-relevant default is the
|
||||
current M2-coding model — at time of writing M2.5 for PAYG, M2.1 for
|
||||
the subscription plan. **Treat the default as config, not a
|
||||
constant**, and call `Models(ctx)` to enumerate live.
|
||||
- **Pricing (PAYG, for `CostPer1k*` metadata):** M2.7 ≈ $0.30 / MTok
|
||||
input, $1.20 / MTok output; highspeed ≈ 2×. Convert to the EUR
|
||||
per-1k convention used by the Arm struct. Docs:
|
||||
<https://platform.minimax.io/docs/guides/pricing-token-plan>
|
||||
- **Subscription:** "Token Plan" (current; supersedes the former
|
||||
"Coding Plan"). Flat-rate prompt quota over a rolling window
|
||||
(published M2.7 limits 1,500–30,000 requests / 5h across tiers).
|
||||
Same Bearer key as PAYG.
|
||||
|
||||
---
|
||||
|
||||
## Non-goals
|
||||
|
||||
- **A bespoke MiniMax SDK / translation layer.** We reuse the existing
|
||||
OpenAI-compat (default) or Anthropic provider via `BaseURL`. If
|
||||
MiniMax adds non-standard body fields, use the existing
|
||||
`openai.NewWithStreamOptions` escape hatch (the same one Ollama uses).
|
||||
- **Region auto-detection.** Ship the international endpoint as the
|
||||
default; the user can override via `[provider.endpoints]`. A region
|
||||
toggle is a follow-up if anyone asks.
|
||||
- **Full subscription-quota accounting.** Phase 2 models subscription
|
||||
cost as a coarse `CostWeight` zero-out, not a live quota meter.
|
||||
|
||||
---
|
||||
|
||||
## Decision: OpenAI-compat vs Anthropic-compat backing
|
||||
|
||||
**Default to OpenAI-compat** (`internal/provider/openaicompat`). It is
|
||||
already exercised by the local backends (ollama/llamacpp), so the
|
||||
streaming, tool-call, and error paths are battle-tested in this repo.
|
||||
The Anthropic-compat endpoint is a fallback only if a MiniMax feature
|
||||
(e.g. extended thinking) is exposed solely through it. Keep the option
|
||||
open by making the backing selectable via config
|
||||
(`[provider.minimax].api = "openai" | "anthropic"`), defaulting to
|
||||
`openai`.
|
||||
|
||||
---
|
||||
|
||||
## Design
|
||||
|
||||
### Phase 1 — provider wiring (smallest shippable slice)
|
||||
|
||||
Goal: `gnoma --provider minimax` works against PAYG with metered
|
||||
pricing, registered as a cloud arm.
|
||||
|
||||
1. **Constructor.** Add `NewMiniMax(cfg provider.ProviderConfig)
|
||||
(provider.Provider, error)` to
|
||||
`internal/provider/openaicompat/provider.go`, mirroring `NewOllama`
|
||||
/ `NewLlamaCpp` (`openaicompat/provider.go:18-49`):
|
||||
- Default `BaseURL` to `https://api.minimax.io/v1` when unset (but
|
||||
let `[provider.endpoints].minimax` override).
|
||||
- Require a real API key (unlike Ollama's dummy key) — return an
|
||||
error if `cfg.APIKey == ""`.
|
||||
- Leave `MaxRetries` at the SDK default (cloud failures *are*
|
||||
transient, unlike the local backends which force `0`).
|
||||
- Default `cfg.Model` to the current coding model **read from
|
||||
config**, not a baked constant.
|
||||
|
||||
2. **Construction switch.** Add `case "minimax": return
|
||||
openaicompat.NewMiniMax(cfg)` to `createProvider`
|
||||
(`cmd/gnoma/main.go:1265-1280`). If `[provider.minimax].api =
|
||||
"anthropic"`, route to `anthropicprov.New(cfg)` with `cfg.BaseURL`
|
||||
set to the anthropic-compat host instead.
|
||||
|
||||
3. **Provider enumerations.** Add `"minimax"` to:
|
||||
- the known-providers set (`main.go:233-236`),
|
||||
- the available-providers usage string (`main.go:1279`),
|
||||
- NOT the local-providers set (it is a cloud arm).
|
||||
|
||||
4. **API key (optional friendliness).** `envKeyFor`'s default already
|
||||
yields `MINIMAX_API_KEY`. Add an explicit `case "minimax"` in
|
||||
`envKeyFor` (`main.go:1189-1201`) only if we want alternates (e.g.
|
||||
`MINIMAX_GROUP_ID` if the account requires a group id header —
|
||||
VERIFY whether MiniMax needs a group id alongside the key; if so,
|
||||
thread it through `ProviderConfig.Options`).
|
||||
|
||||
5. **Family defaults.** Add MiniMax model families to
|
||||
`knownFamilyDefaults` in `internal/router/defaults.go` (pattern at
|
||||
`defaults.go:212-239`). Cloud arm → no `MaxComplexity` ceiling. Set
|
||||
`Strengths` (`TaskGeneration`, `TaskRefactor`, `TaskDebug` are the
|
||||
coding sweet spot) and a low `CostWeight` (~0.8–1.0 — cheap arm, so
|
||||
the cost penalty is small) plus `CostPer1kInput/Output` from the
|
||||
verified PAYG pricing.
|
||||
|
||||
6. **Rate limits.** Add a `minimaxDefaults()` entry in
|
||||
`internal/provider/ratelimits.go` (pattern at the anthropic block
|
||||
~`ratelimits.go:109-130`) and wire it into the `DefaultRateLimits`
|
||||
switch. Use the published PAYG RPM/TPM; allow `[rate_limits.minimax]`
|
||||
config overrides (the existing override path in `resolveRateLimitPools`).
|
||||
|
||||
### Phase 2 — subscription (Token Plan) billing model
|
||||
|
||||
The router's `CostWeight` math assumes metered per-token pricing. Under
|
||||
a Token Plan subscription, marginal cost is ≈0 until the quota is hit,
|
||||
then requests hard-fail. Design:
|
||||
|
||||
1. **Billing knob.** `[provider.minimax].billing = "metered" |
|
||||
"subscription"` (default `"metered"`). In `subscription` mode, set
|
||||
the arm's `CostWeight` to 0 (or `CostPer1k*` to 0) so the selector
|
||||
treats MiniMax as free while quota remains.
|
||||
|
||||
2. **Quota-exhaustion failover.** MiniMax returns a quota/429 error
|
||||
when the plan is exhausted. Map it to the existing rate-limit
|
||||
backoff path (`Arm.BackoffUntil`, the 429 handling that already
|
||||
disables an arm temporarily) so the bandit fails over to the next
|
||||
arm cleanly. This ties into the session error-recovery work landed
|
||||
in `0d3d190`. Confirm the exact error shape MiniMax returns and add
|
||||
a classifier in `internal/provider/errors.go`.
|
||||
|
||||
3. **Docs.** Document both plans + the region split in
|
||||
`docs/slm-backends.md` (or a new provider doc) and the README
|
||||
provider list.
|
||||
|
||||
---
|
||||
|
||||
## Touch-points (file:line)
|
||||
|
||||
| Change | Location |
|
||||
|---|---|
|
||||
| `NewMiniMax` constructor | `internal/provider/openaicompat/provider.go` (after `:49`) |
|
||||
| Construction switch case | `cmd/gnoma/main.go:1265-1280` |
|
||||
| Known-providers set | `cmd/gnoma/main.go:233-236` |
|
||||
| Usage string | `cmd/gnoma/main.go:1279` |
|
||||
| `envKeyFor` (optional) | `cmd/gnoma/main.go:1189-1201` |
|
||||
| Family defaults | `internal/router/defaults.go:212-239` |
|
||||
| Rate-limit defaults | `internal/provider/ratelimits.go` (+ `DefaultRateLimits` switch) |
|
||||
| Error classifier (Phase 2) | `internal/provider/errors.go` |
|
||||
| Config: `[provider.minimax]` | `internal/config/config.go` (provider section) |
|
||||
|
||||
The `Provider` interface contract to satisfy
|
||||
(`internal/provider/provider.go:136-148`): `Stream`, `Name`, `Models`,
|
||||
`DefaultModel`. All four come free by delegating to the OpenAI-compat
|
||||
base provider.
|
||||
|
||||
---
|
||||
|
||||
## Testing (TDD — write first)
|
||||
|
||||
Per CLAUDE.md: table-driven, `//go:build integration` for anything
|
||||
hitting the live API.
|
||||
|
||||
- **Unit (no network):**
|
||||
- `NewMiniMax` defaults: empty `BaseURL` → `https://api.minimax.io/v1`;
|
||||
empty key → error; `[provider.endpoints].minimax` override wins.
|
||||
- `createProvider("minimax", …)` returns a non-nil provider; unknown
|
||||
still errors.
|
||||
- `envKeyFor("minimax") == "MINIMAX_API_KEY"`.
|
||||
- `defaults.go`: a MiniMax model family resolves to the expected
|
||||
`Strengths`/`CostWeight`; `MaxComplexity == 0`.
|
||||
- `ratelimits.go`: `DefaultRateLimits("minimax").LookupModel(...)`
|
||||
returns the configured limits; `"*"` fallback works.
|
||||
- Phase 2: billing=`subscription` → arm `CostWeight == 0`; the
|
||||
quota/429 error maps to a retryable/backoff classification.
|
||||
- **Integration (`//go:build integration`, real `MINIMAX_API_KEY`):**
|
||||
a one-shot `Stream` against the cheapest model returns tokens;
|
||||
`Models(ctx)` enumerates a non-empty list.
|
||||
|
||||
### Acceptance criteria
|
||||
|
||||
1. `MINIMAX_API_KEY=… gnoma --provider minimax -p "hello"` streams a
|
||||
response in pipe mode.
|
||||
2. With no `--provider`, MiniMax appears as a selectable router arm and
|
||||
is chosen for a cheap generation task when `prefer` allows cloud.
|
||||
3. `gnoma providers` lists `minimax`.
|
||||
4. Phase 2: with `billing="subscription"`, the selector prefers MiniMax
|
||||
for eligible tasks; on simulated quota-exhaustion the router fails
|
||||
over without surfacing an error to the user.
|
||||
|
||||
---
|
||||
|
||||
## TODO linkage
|
||||
|
||||
Replaces the inline "MiniMax provider" bullet in `TODO.md` (In flight).
|
||||
Link this file from that entry.
|
||||
@@ -0,0 +1,328 @@
|
||||
# models.dev as source of truth for model specs & pricing — 2026-06-04
|
||||
|
||||
Adopts **models.dev** as the objective-facts source for model names,
|
||||
context windows, output limits, modalities, capabilities, and pricing —
|
||||
feeding `provider.Capabilities` and `Arm.CostPer1k{Input,Output}` — while
|
||||
gnoma's `internal/router/defaults.go` keeps the *subjective* routing
|
||||
policy. Prices are user-overridable via config.
|
||||
|
||||
Adds the TODO.md entry "models.dev as source of truth for model specs".
|
||||
|
||||
Reference: <https://github.com/anomalyco/models.dev> ·
|
||||
API: `https://models.dev/api.json` (also `models.json`, `catalog.json`).
|
||||
MIT-licensed, community-contributed TOML, served as static JSON.
|
||||
|
||||
---
|
||||
|
||||
## Problem
|
||||
|
||||
gnoma scatters model facts across hardcoded tables:
|
||||
|
||||
- **Capabilities** (context window, max output, vision, tool use) are
|
||||
baked into each provider's `Models()` — e.g.
|
||||
`internal/provider/openai/provider.go:120-241` has per-model
|
||||
`ContextWindow`/`MaxOutput` literals.
|
||||
- **Pricing** is largely **absent**. `Arm.CostPer1k{Input,Output}` exist
|
||||
(`internal/router/arm.go:63-64`, used by `arm.go:96`) and there is a
|
||||
seam to populate them — `Router.RegisterProvider(..., costs map[string]
|
||||
[2]float64)` at `internal/router/router.go:393,418` — but it has **no
|
||||
production caller**. Arms are built via `RegisterArm` in
|
||||
`cmd/gnoma/main.go:527,559,932` with per-token price left at zero. So
|
||||
the cost-aware bandit math runs on mostly-empty data today.
|
||||
- **Routing policy** (`MaxComplexity`, `Strengths`, `CostWeight`,
|
||||
`SizeCaps`) lives in `internal/router/defaults.go:53+` — benchmark-
|
||||
derived judgments, manually refreshed (last snapshot 2026-05-23).
|
||||
|
||||
These tables drift: new models ship, prices change, gnoma's literals go
|
||||
stale. models.dev solves exactly the *objective* half of this and is
|
||||
designed to be consumed as static JSON.
|
||||
|
||||
### The seam (this is the whole spec)
|
||||
|
||||
models.dev supplies **facts**; gnoma keeps **opinions**. Clean split:
|
||||
|
||||
| Field | Source after this change |
|
||||
|---|---|
|
||||
| context window, max output, modalities, tool-use, reasoning/thinking, knowledge cutoff, status (deprecated/beta) | **models.dev** → `provider.Capabilities` |
|
||||
| input/output token price | **models.dev** → `Arm.CostPer1k{Input,Output}` (with user override) |
|
||||
| `MaxComplexity`, `Strengths`, `CostWeight`, `SizeCaps`, `Disabled` | **`defaults.go` stays** — models.dev has no opinion on these |
|
||||
|
||||
`defaults.go` is **augmented, not replaced.** It loses nothing; it gains
|
||||
accurate facts to apply its policy against.
|
||||
|
||||
---
|
||||
|
||||
## Non-goals
|
||||
|
||||
- **Replacing `internal/router/defaults.go`.** The subjective routing
|
||||
policy stays hand-curated.
|
||||
- **A live dependency on models.dev at runtime.** gnoma stays offline-
|
||||
first: a vendored snapshot ships in the binary; refresh is explicit and
|
||||
opt-in (no phone-home).
|
||||
- **Letting models.dev override user config.** User `[provider]` /
|
||||
`[arms]` / price overrides always win over the dataset.
|
||||
- **Importing models.dev's TOML format.** Consume the published
|
||||
`api.json`; don't vendor their per-model TOML tree.
|
||||
|
||||
---
|
||||
|
||||
## Design
|
||||
|
||||
### Data ingestion (`internal/modelsdb`)
|
||||
|
||||
New package owning the dataset:
|
||||
|
||||
```
|
||||
internal/modelsdb/
|
||||
modelsdb.go // typed view: Lookup(provider, model) -> ModelSpec
|
||||
schema.go // structs matching models.dev api.json
|
||||
snapshot.go // //go:embed vendored snapshot (offline default)
|
||||
refresh.go // fetch + validate + write user-cache copy
|
||||
convert.go // ModelSpec -> provider.Capabilities + per-1k cost
|
||||
```
|
||||
|
||||
- **`schema.go`** maps the models.dev shape: per-provider, per-model
|
||||
`name`, `cost.input`/`cost.output` (USD **per million tokens**),
|
||||
`limit.context`/`limit.output`, `modalities.input`,
|
||||
`tool_call`/`reasoning` flags, `knowledge`, `status`.
|
||||
- **`snapshot.go`** embeds a checked-in `api.json` snapshot via
|
||||
`//go:embed` so a fresh binary works fully offline with sane defaults.
|
||||
- **`refresh.go`** implements `gnoma models refresh`: fetch `api.json`,
|
||||
validate, write to `~/.config/gnoma/models.dev.json`. Load order at
|
||||
startup: **user cache → embedded snapshot** (newest wins; user config
|
||||
overrides both, see below).
|
||||
|
||||
### Unit & currency conversion (`convert.go`) — easy to get wrong
|
||||
|
||||
models.dev prices are **USD per million tokens**; gnoma's
|
||||
`Arm.CostPer1k{Input,Output}` is per-1k. Two transforms, kept distinct:
|
||||
|
||||
1. **Unit: ÷ 1000** (per-million → per-1k). Always applied,
|
||||
currency-independent. **This step gets an explicit unit test.**
|
||||
2. **Currency: convert USD → the user's display currency** (see below).
|
||||
|
||||
`Arm.CostPer1k*` is stored in the **user's configured currency**; the
|
||||
unit comment in `arm.go:96` is updated from "EUR per 1k" to
|
||||
"per 1k, in `[models].currency`".
|
||||
|
||||
Capabilities map directly and are currency-independent:
|
||||
`limit.context → ContextWindow`, `limit.output → MaxOutput`,
|
||||
`tool_call → ToolUse`, `modalities.input contains image → Vision`,
|
||||
`reasoning → ThinkingModes`.
|
||||
|
||||
### Configurable display currency + daily FX rate (`fx.go`)
|
||||
|
||||
The display currency is **user-configurable** (USD, EUR, GBP, …).
|
||||
models.dev is the USD source of truth; conversion is layered on top:
|
||||
|
||||
- **`[models].currency`** sets the target (default `EUR` to match the
|
||||
historical field; `USD` is the no-op identity).
|
||||
- **Daily FX rate, fetched on launch.** On startup gnoma checks a cached
|
||||
rate (`~/.config/gnoma/fx-rate.json`); if it is older than today
|
||||
(date-stamped, day-granular), it fetches a fresh USD→`currency` rate
|
||||
from a configurable FX endpoint (`[models].fx_source`), updates the
|
||||
cache, and applies it. The fetch is **non-blocking and best-effort**:
|
||||
on failure (offline, endpoint down) gnoma keeps the last cached rate
|
||||
and logs a one-line notice — it never blocks launch or errors out.
|
||||
- **Disable toggle.** `[models].currency_conversion = false` turns the
|
||||
whole feature off: **no FX fetch, no network call, prices shown in
|
||||
USD** (models.dev native). This is also the implied state when
|
||||
`currency = "USD"`.
|
||||
- **Rate provenance.** The cached `fx-rate.json` records the rate, the
|
||||
date fetched, and the source, so `gnoma models` / `gnoma doctor` can
|
||||
show "prices in EUR @ 0.92 USD→EUR (2026-06-04, ecb)" and flag a stale
|
||||
rate. A user may also pin a **fixed rate** (`[models].fx_rate = 0.92`)
|
||||
to skip fetching entirely while still displaying a non-USD currency.
|
||||
|
||||
FX rate precedence (highest first): **pinned `fx_rate` → today's cached
|
||||
fetch → last good cached fetch → `1.0` (USD identity) with a warning**.
|
||||
The FX endpoint host joins the egress allowlist baseline alongside
|
||||
`models.dev`.
|
||||
|
||||
### Wiring into arm construction
|
||||
|
||||
The existing seam is `RegisterProvider(..., costs)` (`router.go:393`).
|
||||
Two integration options (Open Questions):
|
||||
|
||||
- **A (preferred):** at arm registration in `cmd/gnoma/main.go:527+`,
|
||||
enrich each arm from `modelsdb.Lookup(provider, model)` — set
|
||||
`CostPer1k*` from the converted price and **fill any zero-valued
|
||||
Capabilities** the provider's `Models()` didn't supply. Provider
|
||||
`Models()` literals become a fallback for models models.dev doesn't
|
||||
list, not the primary source.
|
||||
- **B:** route everything through `RegisterProvider`'s `costs` map by
|
||||
building it from `modelsdb`. Cleaner but requires switching `main.go`
|
||||
off direct `RegisterArm`.
|
||||
|
||||
Either way, **`defaults.go` applies on top unchanged** (longest-prefix
|
||||
family match for `MaxComplexity`/`Strengths`/`CostWeight`).
|
||||
|
||||
### User-configurable cost (required)
|
||||
|
||||
Prices are not one-size-fits-all: subscription plans make marginal cost
|
||||
~0 until quota (the MiniMax Coding Plan case in the provider TODO),
|
||||
negotiated enterprise rates differ, and local models are free. The
|
||||
models.dev price is the **default**, overridable per arm:
|
||||
|
||||
```toml
|
||||
[models]
|
||||
refresh = "manual" # manual | never (never = embedded snapshot only)
|
||||
currency = "EUR" # display currency; USD = identity (no conversion)
|
||||
currency_conversion = true # false → no FX fetch, prices shown in USD
|
||||
fx_source = "https://..." # daily USD→currency rate endpoint (egress-allowlisted)
|
||||
# fx_rate = 0.92 # optional: pin a fixed rate, skip daily fetch
|
||||
|
||||
# Per-arm / per-model price override — wins over models.dev.
|
||||
# Override prices are interpreted in [models].currency.
|
||||
[[provider.cost]]
|
||||
arm = "minimax/MiniMax-M2"
|
||||
billing = "subscription" # zeroes marginal cost while quota remains
|
||||
# or explicit metered numbers (per 1k, in [models].currency):
|
||||
[[provider.cost]]
|
||||
arm = "anthropic/claude-..."
|
||||
input_per_1k = 0.0028
|
||||
output_per_1k = 0.014
|
||||
```
|
||||
|
||||
Precedence (highest first): **user `[[provider.cost]]` override →
|
||||
models.dev (unit-converted + currency-converted) → provider `Models()`
|
||||
fallback → zero**. Both input *and* output prices flow through the same
|
||||
unit ÷1000 and currency conversion. The
|
||||
`billing = "subscription"` knob ties into the open MiniMax billing
|
||||
question (TODO "MiniMax provider") and zeroes `CostWeight`-effective cost
|
||||
while quota remains, then hard-stops on 429 failover. Local arms
|
||||
(`IsLocal`) default to zero cost regardless of dataset.
|
||||
|
||||
### Offline-first & egress
|
||||
|
||||
- The embedded snapshot means **zero network calls** unless the user runs
|
||||
`gnoma models refresh`.
|
||||
- `models.dev` becomes a curated host in the egress allowlist baseline
|
||||
(`2026-06-04-egress-allowlist.md` ships package + provider hosts; add
|
||||
`models.dev`), so even refresh stays inside the firewall policy.
|
||||
- `gnoma doctor` (shipped `cmd/gnoma/doctor_cmd.go`) gains a check:
|
||||
snapshot age, models referenced in config but absent from the dataset,
|
||||
and prices that look stale vs the dataset.
|
||||
|
||||
### Surfacing
|
||||
|
||||
- `gnoma models` lists resolved arms with their effective price + caps +
|
||||
source (`models.dev` / `override` / `fallback`) — analogous to
|
||||
`gnoma providers`.
|
||||
- The TUI status line / model picker can show context window and
|
||||
price-per-turn estimates now that the data is reliable
|
||||
(`internal/tui/rendering.go:551-620`, ties to the TUI/UX plan).
|
||||
|
||||
---
|
||||
|
||||
## Touch-points (file:line)
|
||||
|
||||
| Change | Location |
|
||||
|---|---|
|
||||
| New dataset package | new `internal/modelsdb/` |
|
||||
| Embedded snapshot | `internal/modelsdb/snapshot.go` (`//go:embed api.json`) |
|
||||
| Daily FX fetch + cache | new `internal/modelsdb/fx.go`, `~/.config/gnoma/fx-rate.json`, called on launch near config load `cmd/gnoma/main.go:131-166` |
|
||||
| `gnoma models` / `models refresh` subcommand | `cmd/gnoma/main.go:179-196`; new `cmd/gnoma/models_cmd.go` |
|
||||
| Capabilities struct (target) | `internal/provider/provider.go:94` |
|
||||
| Per-model cap literals (become fallback) | `internal/provider/openai/provider.go:120-241` (+ peers) |
|
||||
| Cost fields + math | `internal/router/arm.go:63-64,96` |
|
||||
| Cost seam | `internal/router/router.go:393,418` |
|
||||
| Arm enrichment at registration | `cmd/gnoma/main.go:527,559,932` |
|
||||
| Routing policy (unchanged, applied on top) | `internal/router/defaults.go:53+` |
|
||||
| Config: `[models]`, `[[provider.cost]]` | `internal/config/config.go` |
|
||||
| doctor checks (snapshot + FX-rate staleness) | `cmd/gnoma/doctor_cmd.go`, `internal/config/doctor.go` |
|
||||
| Egress hosts (`models.dev` + `fx_source`) | `2026-06-04-egress-allowlist.md` baseline |
|
||||
|
||||
---
|
||||
|
||||
## Testing (TDD — write first)
|
||||
|
||||
- **Schema parse:** `api.json` (a fixture slice) unmarshals into
|
||||
`schema.go` structs; unknown fields ignored; missing optional fields
|
||||
tolerated.
|
||||
- **Unit conversion (critical):** a known models.dev entry (USD/million)
|
||||
converts to the expected USD/1k — guards the ÷1000 step independently
|
||||
of currency.
|
||||
- **Currency conversion:** USD/1k → EUR/1k given a rate; `currency="USD"`
|
||||
and `currency_conversion=false` are both identity (no conversion,
|
||||
prices in USD); a pinned `fx_rate` is used verbatim. Output and input
|
||||
prices both convert.
|
||||
- **Daily FX fetch:** a cache dated today is reused (no fetch); a stale
|
||||
cache triggers a fetch against a stub endpoint and updates the cache;
|
||||
a failed fetch falls back to the last good cached rate (and to `1.0`
|
||||
with a warning if none) — launch never blocks or errors.
|
||||
- **Capability mapping:** `tool_call`→`ToolUse`, image modality→`Vision`,
|
||||
`limit.context`→`ContextWindow`, `reasoning`→`ThinkingModes`.
|
||||
- **Override precedence:** user `[[provider.cost]]` beats models.dev;
|
||||
models.dev beats provider fallback; `billing="subscription"` zeroes
|
||||
marginal cost; `IsLocal` arms are free regardless of dataset.
|
||||
- **defaults.go untouched:** an arm enriched from models.dev still gets
|
||||
its `MaxComplexity`/`Strengths`/`CostWeight` from the family table
|
||||
(longest-prefix match), and a model *absent* from models.dev still
|
||||
works via provider `Models()` fallback.
|
||||
- **Offline:** with no user cache and network blocked, the embedded
|
||||
snapshot fully populates arms (no network call attempted).
|
||||
- **Refresh:** `models refresh` against a stub server writes a valid
|
||||
user cache; a malformed response is rejected and the prior cache /
|
||||
snapshot is retained (no corruption).
|
||||
- **doctor:** flags a config-referenced model missing from the dataset
|
||||
and a stale snapshot.
|
||||
|
||||
### Acceptance criteria
|
||||
|
||||
1. A fresh binary populates context window, max output, vision, tool-use,
|
||||
and price for known models **offline** from the embedded snapshot.
|
||||
2. `gnoma models` shows each arm's effective caps + price + source.
|
||||
3. `gnoma models refresh` updates the dataset within the egress policy;
|
||||
offline default unchanged without it.
|
||||
4. User `[[provider.cost]]` overrides (explicit price or
|
||||
`billing="subscription"`) win over models.dev; local arms are free.
|
||||
5. `internal/router/defaults.go` policy still applies on top, unchanged.
|
||||
6. A model not in models.dev still works via the provider's `Models()`
|
||||
fallback.
|
||||
7. Unit (÷1000) and currency conversion are correct and unit-tested.
|
||||
8. Display currency is user-configurable; the FX rate is fetched daily on
|
||||
launch (best-effort, non-blocking), cached, and shown with provenance.
|
||||
9. `currency_conversion = false` (or `currency = "USD"`) disables the FX
|
||||
fetch entirely and shows prices in USD.
|
||||
|
||||
---
|
||||
|
||||
## Open questions (resolve at implementation)
|
||||
|
||||
- **FX rate source** — which `fx_source` endpoint ships as the default
|
||||
(ECB daily reference rates are free, EUR-based, no key; others need an
|
||||
API key). Pick a keyless default; document overriding it. The daily
|
||||
cadence is day-granular (date-stamped cache), not intraday.
|
||||
- **Currency field unit** — `Arm.CostPer1k*` now stores the user's
|
||||
display currency (was nominally EUR). Confirm no other code assumes the
|
||||
field is EUR; update the `arm.go:96` comment. Cost-comparison math in
|
||||
the bandit is currency-agnostic (all arms share one currency) so
|
||||
selection is unaffected.
|
||||
- **Integration point** — enrich arms in-place at `main.go` (Option A,
|
||||
preferred, smaller diff) vs route through `RegisterProvider`'s `costs`
|
||||
map (Option B, cleaner seam). Decide when touching `main.go`.
|
||||
- **Endpoint choice** — `api.json` (full) vs `models.json` (provider-
|
||||
agnostic) vs `catalog.json`. Lean `api.json`; the snapshot makes size
|
||||
a non-issue.
|
||||
- **Refresh cadence** — manual-only (chosen, no-phone-how posture) vs an
|
||||
opt-in periodic check. Default manual; never auto.
|
||||
- **Snapshot freshness in CI** — whether a CI job re-vendors the embedded
|
||||
`api.json` on a schedule so shipped binaries don't drift. Likely yes;
|
||||
separate chore.
|
||||
- **MaxComplexity from benchmarks** — models.dev has no complexity
|
||||
opinion; if it ever adds benchmark data, revisit whether `defaults.go`
|
||||
could derive `MaxComplexity`. Out of scope now.
|
||||
|
||||
---
|
||||
|
||||
## TODO linkage
|
||||
|
||||
New "models.dev as source of truth for model specs" entry in `TODO.md`
|
||||
(In flight) links here. Augments (does not replace) `defaults.go`:
|
||||
models.dev supplies objective facts → `provider.Capabilities` +
|
||||
`Arm.CostPer1k*`; prices are user-overridable via `[[provider.cost]]`
|
||||
(intersects the MiniMax subscription-billing question); display currency
|
||||
is configurable with a daily best-effort FX rate fetched on launch
|
||||
(disable → USD); offline-first via an embedded snapshot; `models.dev` and
|
||||
the FX source join the egress allowlist baseline.
|
||||
@@ -0,0 +1,312 @@
|
||||
# Multi-Agent Engineering Forge (MAEF) — 2026-06-04
|
||||
|
||||
A deterministic, language-agnostic pipeline orchestrator that decouples
|
||||
**Context Mapping → Code Generation → Deterministic Validation →
|
||||
Cross-Vendor Critique** into a stateful state machine with strict
|
||||
programmatic gates and loop-back. Shipped as `gnoma forge`.
|
||||
|
||||
Adds the TODO.md entry "Multi-Agent Engineering Forge (MAEF)".
|
||||
|
||||
---
|
||||
|
||||
## Problem
|
||||
|
||||
gnoma's single-turn agentic loop (`internal/engine/loop.go:88` `runLoop`)
|
||||
is excellent for interactive work but couples four concerns the user's
|
||||
MAEF spec wants separated: planning, generation, deterministic
|
||||
validation, and semantic critique. The MAEF design's core claim is that
|
||||
**transitions between stages are governed by programmatic gates, not LLM
|
||||
choices** — a state machine, not a mega-prompt. That maps almost exactly
|
||||
onto machinery gnoma already owns; the only genuinely new package is the
|
||||
sandbox.
|
||||
|
||||
The mapping (this is the whole spec — reuse, don't duplicate):
|
||||
|
||||
| MAEF concept | gnoma reality |
|
||||
|---|---|
|
||||
| Deterministic orchestrator with programmatic gates | A **Go state machine** in new `internal/forge` — not an LLM, not the engine's tool-driven loop |
|
||||
| Agent 1 Context Planner (LLM) | An **elf** (`elf.Manager.SpawnWithProvider`, `internal/elf/manager.go:153`), read-only tools, JSON output |
|
||||
| Agent 2 Forge Agent (LLM) | An **elf** that emits a unified diff (`diff -u`) as text |
|
||||
| Agent 3 Sandbox Gate (**non-LLM**) | A plain Go function over a new `internal/sandbox` — **not** an elf |
|
||||
| Agent 4 Adversarial Critic (LLM) | An **elf pinned to a different vendor/arm** than Forge (`router.ForceArm`) |
|
||||
| Unified Model Intermediary | gnoma's existing `provider.Provider` + `router` |
|
||||
| Ephemeral Docker workspace | git-**worktree** default; docker an optional backend behind one interface |
|
||||
|
||||
The LLM stages are elfs (each its own `engine.Engine`, system prompt,
|
||||
and routed arm). The gates between them are deterministic Go. Making
|
||||
that split explicit is what keeps this from becoming a parallel system
|
||||
bolted next to the engine.
|
||||
|
||||
---
|
||||
|
||||
## Non-goals
|
||||
|
||||
- **Replacing the interactive TUI / pipe modes.** `gnoma forge` is a new
|
||||
batch/headless entry mode alongside them.
|
||||
- **Replacing the engine's `runLoop`.** Each elf still runs the normal
|
||||
loop internally; MAEF orchestrates *between* elfs.
|
||||
- **A general workflow engine.** The pipeline is fixed (Plan → Forge →
|
||||
Sandbox → Critic with loop-back); arbitrary DAGs are out of scope.
|
||||
- **Docker as a hard dependency.** Worktree is the default backend so the
|
||||
static-binary, no-daemon posture holds; docker is opt-in.
|
||||
- **LLM-driven control flow.** Stage transitions are Go code with status
|
||||
codes, never a model deciding "what next".
|
||||
|
||||
---
|
||||
|
||||
## Design
|
||||
|
||||
### Entry mode: `gnoma forge`
|
||||
|
||||
New subcommand following the established dispatch pattern
|
||||
(`cmd/gnoma/main.go:179-196`, peers `doctor`/`config`/`router`): add
|
||||
`case "forge": os.Exit(runForgeCommand(...))` and a `forge_cmd.go`.
|
||||
Inputs: a spec (file or stdin) + the user prompt. Reuses the same
|
||||
config/router/security/elf-manager construction as TUI/pipe; only the
|
||||
front-end orchestration differs.
|
||||
|
||||
```
|
||||
gnoma forge --spec ./spec.md "add rate-limit middleware to the auth router"
|
||||
gnoma forge --spec ./spec.md --max-iters 5 --critic-arm anthropic/...
|
||||
```
|
||||
|
||||
### Package layout
|
||||
|
||||
```
|
||||
internal/forge/
|
||||
forge.go // state machine: states, transitions, the run loop
|
||||
planner.go // Stage 1 elf: context map (read-only tools, JSON out)
|
||||
forger.go // Stage 2 elf: emit unified diff
|
||||
critic.go // Stage 4 elf: semantic critique, cross-vendor arm
|
||||
state.go // Iteration state, feedback history, terminal-failure handling
|
||||
prompts.go // System prompts per stage (constraints from MAEF §2)
|
||||
internal/sandbox/
|
||||
sandbox.go // Sandbox interface (the only genuinely new abstraction)
|
||||
worktree.go // default backend: git worktree + host exec
|
||||
docker.go // optional backend (build tag / config-gated)
|
||||
config.go // WorkspaceConfiguration contract (setup/validate/test)
|
||||
```
|
||||
|
||||
The Stage-3 gate is a function in `forge.go` that calls `internal/sandbox`
|
||||
— deliberately **not** a file in the elf/agent layer, to keep "non-LLM"
|
||||
honest.
|
||||
|
||||
### The state machine (`forge.go`)
|
||||
|
||||
States and the **programmatic** transitions between them:
|
||||
|
||||
```
|
||||
PLAN ─► FORGE ─► SANDBOX ─┬─[exit≠0]─► FORGE (sandbox_error, bypass critic)
|
||||
└─[exit=0]─► CRITIC ─┬─[reject]─► FORGE (critic_critique)
|
||||
└─[APPROVED]─► DONE
|
||||
guards: iter < max_iters; patch applies cleanly; worktree state consistent
|
||||
terminal failures ─► ABORT (revert worktree to last good commit)
|
||||
```
|
||||
|
||||
- **Gate after Sandbox:** if the sandbox exit code is non-zero, capture
|
||||
stdout/stderr verbatim and route it back to Forge as a priority
|
||||
`sandbox_error` — **the Critic is bypassed entirely** (MAEF §2.3). On
|
||||
exit 0, package the applied diff + logs and advance to Critic.
|
||||
- **Gate after Critic:** `STATUS: APPROVED` (exact sentinel) → DONE; any
|
||||
other output is parsed as a `critic_critique` and looped back to Forge.
|
||||
- **Loop budget:** hard `--max-iters` ceiling (default 5) so the pipeline
|
||||
always terminates. Each iteration carries the feedback history forward
|
||||
(`state.go`), and the Forge prompt is instructed to prioritise the most
|
||||
recent `sandbox_error` / `critic_critique` over new additions
|
||||
(MAEF §2.2).
|
||||
|
||||
### Stage 1 — Context Planner (elf)
|
||||
|
||||
`manager.Spawn(ctx, taskType, prompt, plannerSystemPrompt, maxTurns)`
|
||||
(`internal/elf/manager.go:65`) with **read-only tools only** (`fs.read`,
|
||||
grep/glob — gate via the engine's allowed-tools / `TurnOptions`,
|
||||
`internal/engine/loop.go` `TurnOptions`). System prompt (`prompts.go`)
|
||||
enforces the MAEF §2.1 constraints: do not write code; emit JSON with
|
||||
`targets` / `dependencies` / `rationale`. Output parsed against a schema;
|
||||
a malformed map is a retry, then a terminal failure.
|
||||
|
||||
### Stage 2 — Forge Agent (elf)
|
||||
|
||||
Ingests the context map + source of mapped files + spec + accumulated
|
||||
feedback. System prompt enforces MAEF §2.2: **emit only a unified diff**
|
||||
(`diff -u`), no prose, never a full file when a partial edit suffices.
|
||||
The diff is **applied via `git apply` inside the sandbox worktree** —
|
||||
*not* the `fs.edit` string-replace tool (`internal/tool/fs/edit.go`).
|
||||
This matches the user's `diff -u` contract and is atomic/cleanly
|
||||
reversible. A corrupt patch is rejected immediately and the raw
|
||||
`git apply` error is fed straight back to Forge (MAEF §2.3 rule 1).
|
||||
|
||||
### Stage 3 — Deterministic Sandbox Gate (non-LLM)
|
||||
|
||||
A Go function, not an elf. Backed by `internal/sandbox`:
|
||||
|
||||
```go
|
||||
type Sandbox interface {
|
||||
Apply(patch []byte) error // git apply in the workspace
|
||||
Run(step string) (Result, error) // setup / validate / test command
|
||||
Revert() error // back to last good commit
|
||||
WorkDir() string
|
||||
Cleanup() error
|
||||
}
|
||||
```
|
||||
|
||||
- **Default backend `worktree.go`:** create a detached git worktree off
|
||||
the current commit (`git worktree add`), apply the patch there, run the
|
||||
lifecycle commands on the host. Fits the static-binary, no-daemon
|
||||
posture — and is the same isolation primitive the agent harness itself
|
||||
uses. On terminal failure, `git worktree remove` / reset (the user's
|
||||
infinite-loop guard: state-sync errors are terminal, revert to last
|
||||
good commit).
|
||||
- **Optional backend `docker.go`:** the same interface over an ephemeral
|
||||
container, gated by config/build-tag, honouring the user's
|
||||
`WorkspaceConfiguration` YAML (`base_image`, `setup`, `validate`,
|
||||
`test`). Swapping backends never touches `forge.go`.
|
||||
- **Lifecycle contract (`config.go`)** mirrors the MAEF YAML:
|
||||
`setup` (e.g. `go mod download` / `npm ci`), `validate`
|
||||
(`go vet` / `cargo check` / `npm run lint`), `test`
|
||||
(`go test ./...` / `jest --findRelatedTests`). Language-agnostic —
|
||||
commands come from `[forge.sandbox]` config or are auto-detected from
|
||||
the project (reuse the `SessionStart` project-type detection already in
|
||||
the repo).
|
||||
|
||||
### Stage 4 — Adversarial Critic (elf, **cross-vendor**)
|
||||
|
||||
The headline of the user's spec. The Critic must be a **different
|
||||
vendor/arm than the Forge** so the critique is genuinely independent, not
|
||||
the same model grading itself.
|
||||
|
||||
- Spawn via `manager.SpawnWithProvider(prov, model, …)`
|
||||
(`internal/elf/manager.go:153`) with the arm chosen by
|
||||
`router.ForceArm` (`internal/router/router.go:147`) so forge-arm ≠
|
||||
critic-arm is **enforced**, not hoped for. If only one vendor is
|
||||
configured, log a clear degraded-mode warning (critique still runs,
|
||||
independence not guaranteed).
|
||||
- Inputs: original spec, applied patch, sandbox logs. System prompt
|
||||
enforces MAEF §2.4: **forbidden from writing code/patches**; evaluates
|
||||
performance, security surface, spec alignment; emits structured
|
||||
markdown pointers or the exact sentinel `STATUS: APPROVED`.
|
||||
|
||||
### Security & safety interplay
|
||||
|
||||
The sandbox runs **AI-generated patches and tests** — a real execution
|
||||
surface. All existing boundaries still apply:
|
||||
|
||||
- `safety.ClassifyCWD` runs before the forge starts; a `refuse`
|
||||
classification aborts.
|
||||
- Every elf's provider is `security.WrapProvider`-wrapped
|
||||
(`internal/security/safeprovider.go:33`) exactly like interactive arms,
|
||||
so firewall + audit + egress allowlist
|
||||
(`2026-06-04-egress-allowlist.md`) hold across all stages.
|
||||
- Sandbox command execution goes through the same `permission` /
|
||||
validation discipline as the `bash` tool
|
||||
(`internal/tool/bash/bash.go` `ValidateCommand`); in headless forge
|
||||
mode the permission posture is config-driven (default: deny network in
|
||||
sandbox unless the lifecycle commands need a declared host).
|
||||
- Terminal state-sync failures **revert the worktree** and abort rather
|
||||
than looping — directly addresses the MAEF §3 infinite-error-loop risk.
|
||||
|
||||
### Unified Model Intermediary
|
||||
|
||||
The MAEF "unified completion interface" already exists as
|
||||
`provider.Provider` (`internal/provider/provider.go:136`) behind the
|
||||
router. MiniMax / Anthropic / local Ollama (the user's diagram's three
|
||||
backends) are just arms. No new abstraction — `prompts.go` + the elf's
|
||||
`request` is the `request_completion(system, prompt, schema)` surface.
|
||||
|
||||
---
|
||||
|
||||
## Touch-points (file:line)
|
||||
|
||||
| Change | Location |
|
||||
|---|---|
|
||||
| `forge` subcommand dispatch | `cmd/gnoma/main.go:179-196`; new `cmd/gnoma/forge_cmd.go` |
|
||||
| State machine + gates | new `internal/forge/forge.go`, `state.go` |
|
||||
| Planner / Forger / Critic elfs | new `internal/forge/{planner,forger,critic,prompts}.go` |
|
||||
| Elf spawn (generic + arm-pinned) | `internal/elf/manager.go:65,153` |
|
||||
| Cross-vendor enforcement | `internal/router/router.go:147` (`ForceArm`) |
|
||||
| Read-only tool gating for Planner | `internal/engine/loop.go` `TurnOptions` (AllowedTools) |
|
||||
| Sandbox abstraction | new `internal/sandbox/{sandbox,worktree,docker,config}.go` |
|
||||
| Patch apply (git, not fs.edit) | `internal/sandbox/worktree.go` (`git apply`) |
|
||||
| Command validation reuse | `internal/tool/bash/bash.go` `ValidateCommand` |
|
||||
| CWD classification | `internal/safety` `ClassifyCWD` |
|
||||
| Provider wrapping | `internal/security/safeprovider.go:33` |
|
||||
| Config section | `internal/config/config.go` (new `[forge]` + `[forge.sandbox]`) |
|
||||
|
||||
---
|
||||
|
||||
## Testing (TDD — write first)
|
||||
|
||||
- **State machine (no LLM, no real sandbox):** drive `forge.go` with a
|
||||
stub planner/forger/critic and a fake sandbox returning scripted exit
|
||||
codes. Assert:
|
||||
- sandbox exit≠0 routes back to Forge and **bypasses** Critic;
|
||||
- sandbox exit=0 advances to Critic;
|
||||
- Critic `STATUS: APPROVED` → DONE; any other output → loop to Forge;
|
||||
- `--max-iters` is a hard ceiling (terminates, returns last state);
|
||||
- a corrupt patch / worktree desync is **terminal** → revert + abort,
|
||||
never an infinite loop.
|
||||
- **Sandbox (worktree backend):** in a `t.TempDir()` git repo, apply a
|
||||
valid patch (succeeds), a corrupt patch (clean rejection with raw
|
||||
error surfaced), run a failing `validate` (non-zero captured), and a
|
||||
passing one; `Revert` restores the last good commit.
|
||||
- **Cross-vendor guard:** with two arms configured, assert forge-arm ≠
|
||||
critic-arm; with one arm, assert the degraded-mode warning fires and
|
||||
the pipeline still runs.
|
||||
- **Planner schema:** valid JSON parses into `targets`/`dependencies`;
|
||||
malformed output retries then fails terminally; planner cannot invoke
|
||||
a write tool (allowed-tools gate).
|
||||
- **Forger output discipline:** non-diff output (prose) is rejected
|
||||
before reaching the sandbox.
|
||||
- **Integration (`//go:build integration`):** end-to-end `gnoma forge`
|
||||
on a fixture repo with a trivial spec, real arms, real worktree —
|
||||
produces an applied, test-passing, critic-approved patch.
|
||||
|
||||
### Acceptance criteria
|
||||
|
||||
1. `gnoma forge --spec … "<prompt>"` runs Plan → Forge → Sandbox →
|
||||
Critic to either an approved patch or a clean bounded failure.
|
||||
2. A failing sandbox loops back to Forge with raw logs and **never**
|
||||
reaches the Critic that iteration.
|
||||
3. The Critic runs on a different vendor/arm than the Forge (or warns).
|
||||
4. Patches apply via `git apply` in an isolated worktree; the user's
|
||||
working tree is untouched until the final approved patch is offered.
|
||||
5. A corrupt patch or worktree desync aborts with a revert — no infinite
|
||||
loop.
|
||||
6. Docker backend is selectable via config without changing `forge.go`.
|
||||
7. All firewall / audit / egress / CWD-classification boundaries apply to
|
||||
every stage.
|
||||
|
||||
---
|
||||
|
||||
## Open questions (resolve at implementation)
|
||||
|
||||
- **Sandbox backend default** — git-worktree (chosen: no daemon, fits
|
||||
static binary) vs docker-ephemeral (the user's diagram's default).
|
||||
Worktree default; docker the swappable backend.
|
||||
- **Final patch delivery** — auto-apply the approved patch to the user's
|
||||
tree, or leave it staged in the worktree / emit it as a `.patch` for
|
||||
the user to apply. Lean: emit + offer to apply (never silently mutate
|
||||
the working tree).
|
||||
- **Critic arm selection** — explicit `--critic-arm` vs automatic "pick
|
||||
the highest-quality arm from a different vendor than Forge". Support
|
||||
both; auto by default.
|
||||
- **Lifecycle command source** — `[forge.sandbox]` config vs
|
||||
auto-detection from project type. Auto-detect with config override.
|
||||
- **Planner/Forger/Critic as router task-types** — whether to add
|
||||
`TaskPlan` / `TaskCritique` `TaskType`s so the bandit can learn
|
||||
per-stage arm quality, or pin arms explicitly. Start pinned; add
|
||||
task-types if telemetry justifies (ties to the bandit-design TODO).
|
||||
- **Relationship to the `agent` tool / elf orchestration** — MAEF is a
|
||||
fixed pipeline; the existing `internal/tool/agent` fan-out stays for
|
||||
interactive sub-agent spawning. Keep them separate.
|
||||
|
||||
---
|
||||
|
||||
## TODO linkage
|
||||
|
||||
New "Multi-Agent Engineering Forge (MAEF)" entry in `TODO.md` (In
|
||||
flight) links here. Builds on the engine, elf manager, router
|
||||
(`ForceArm` for cross-vendor critique), and security boundaries; the
|
||||
only new abstraction is `internal/sandbox` (worktree default, docker
|
||||
optional). The deterministic orchestrator lives in `internal/forge` as a
|
||||
Go state machine — the LLM stages are elfs, the validation gate is not.
|
||||
@@ -0,0 +1,230 @@
|
||||
# TUI/UX refresh — opencode-inspired patterns — 2026-06-04
|
||||
|
||||
Closes concrete UX gaps in gnoma's existing Bubble Tea TUI by borrowing
|
||||
proven interaction patterns from **opencode** (peer AI-coding TUI) and the
|
||||
layout/component philosophy of **opentui**.
|
||||
|
||||
Adds the TODO.md entry "TUI/UX refresh — opencode-inspired patterns".
|
||||
|
||||
References:
|
||||
|
||||
- opencode — <https://github.com/anomalyco/opencode> (UX patterns to mine).
|
||||
- opentui — <https://github.com/anomalyco/opentui> (component/layout
|
||||
*concepts* only — see "What we do **not** borrow" below).
|
||||
|
||||
---
|
||||
|
||||
## Problem
|
||||
|
||||
gnoma already ships a capable Bubble Tea v2 TUI
|
||||
(`internal/tui/`, launched from `cmd/gnoma/main.go:109-115,1151-1172`):
|
||||
themes (`theme.go:30-106`), pickers, slash commands
|
||||
(`completions.go:17-46`), vim mode (`app.go:378-422`), an elf-progress
|
||||
tree (`rendering.go:373-456`), a three-segment status line
|
||||
(`rendering.go:551-620`), and permission-mode cycling
|
||||
(`app.go:643-668`). This is **not greenfield** — it is gap-closing.
|
||||
|
||||
opencode is the closest peer (a terminal-first agentic coder) and has
|
||||
converged on a handful of UX patterns gnoma lacks or under-serves. This
|
||||
plan ports those patterns onto the existing `internal/tui/*` surface,
|
||||
mapping each to the file:line it touches. Nothing here rewrites the TUI;
|
||||
each item is an additive refinement.
|
||||
|
||||
### What we do **not** borrow
|
||||
|
||||
opentui is a **Zig core with TypeScript bindings** (C-ABI, SolidJS/React
|
||||
reconcilers, WebGPU targets). None of it is consumable from gnoma's
|
||||
Go + Bubble Tea stack. We take exactly two *concepts* from it and write
|
||||
them in Go:
|
||||
|
||||
1. **Layout primitives over manual string-joining.** opentui leans on a
|
||||
flexbox layout engine; gnoma's `rendering.go` hand-assembles regions
|
||||
with `lipgloss.JoinVertical/Horizontal`. We formalise a small
|
||||
region/pane layout helper rather than adopting any opentui code.
|
||||
2. **Core-vs-bindings split.** Keep render-state (the "what") separate
|
||||
from lipgloss styling (the "how"), so themes and future render
|
||||
targets don't fork the view logic.
|
||||
|
||||
We do **not** add a reconciler, a second render target, WebGPU, or any
|
||||
non-Go dependency. opentui stays inspiration, not import.
|
||||
|
||||
---
|
||||
|
||||
## Non-goals
|
||||
|
||||
- **A rewrite of the Bubble Tea model.** `app.go`'s `Model`/`Update`/
|
||||
`View` stay; every item is additive.
|
||||
- **A second render backend** (web/WebGPU). The `gnoma web` milestone
|
||||
(M15) is tracked separately; this plan is terminal-only.
|
||||
- **A client/server split.** opencode runs a TS server behind its TUI;
|
||||
gnoma is a single static binary and stays that way. The session-share
|
||||
item below is export/import, not a hosted service.
|
||||
- **Replacing glamour markdown rendering.** We refine how diffs and tool
|
||||
output render, not the markdown engine.
|
||||
|
||||
---
|
||||
|
||||
## Design — patterns, each mapped to the existing TUI
|
||||
|
||||
### 1. Agent / mode switch on a single key (opencode `Tab`)
|
||||
|
||||
opencode toggles **plan** (read-only, asks before bash) vs **build**
|
||||
(full access) with `Tab`. gnoma already *has* the underlying machine —
|
||||
`permission.Mode` (bypass / deny / plan / accept_edits / auto) cycled
|
||||
via Shift+Tab (`app.go:643-668`). The gap is discoverability and a
|
||||
first-class "plan vs do" framing.
|
||||
|
||||
- Promote **plan** and **accept_edits/auto** to a labelled two-state
|
||||
toggle surfaced in the status line (`rendering.go:551-620`), with the
|
||||
full five-mode cycle still on Shift+Tab. Reuse `ModeColor`
|
||||
(`theme.go:164-171`) for the indicator.
|
||||
- No new permission semantics — pure presentation over the existing
|
||||
`permission.Checker`.
|
||||
|
||||
### 2. Leader-key command palette
|
||||
|
||||
Today slash commands are typed (`/model`, `/theme`, …) with completion
|
||||
(`completions.go:17-46`, `app.go:1188-1500+`). opencode adds a
|
||||
leader-key palette for the same actions without typing `/`.
|
||||
|
||||
- Add a leader key (default `Ctrl+K`, configurable) that opens the
|
||||
existing picker overlay machinery (`app.go:339-366`,
|
||||
`rendering.go:126-148`) pre-populated with the `builtinCommands`
|
||||
source. This is a new *entry point* to existing pickers, not a new
|
||||
widget.
|
||||
|
||||
### 3. External theme files (opencode-style theming)
|
||||
|
||||
gnoma has five built-in themes hardcoded in `theme.go:30-106`. opencode
|
||||
loads user theme files. Extend, don't replace:
|
||||
|
||||
- Keep the five built-ins. Add loading of `*.toml`/`*.json` theme files
|
||||
from `~/.config/gnoma/themes/` and `.gnoma/themes/`, parsed into the
|
||||
existing `Theme` struct (`theme.go:13-27`) and registered into the
|
||||
`Themes` array. `/theme <name>` and the picker pick them up for free.
|
||||
- The `[tui] theme` config key (`config.go:434-437`) already selects by
|
||||
name; user themes just widen the namespace.
|
||||
|
||||
### 4. Diff & file-tree rendering for edits
|
||||
|
||||
Tool results currently render generically (`rendering.go:254-371`). The
|
||||
biggest visible opencode win is **syntax-aware diff rendering** for
|
||||
file edits.
|
||||
|
||||
- Detect `fs.edit`/`fs.write` tool results (the edit tool already emits a
|
||||
diff-style payload, `internal/tool/fs/edit.go:136-191`) and render
|
||||
them as a proper red/green unified diff using theme colors, instead of
|
||||
raw text.
|
||||
- Optional: a compact changed-files summary line per turn (paths +
|
||||
+/- counts), themed via the status palette.
|
||||
|
||||
### 5. Session resume / share (export-import, no server)
|
||||
|
||||
opencode has session sharing via its server. gnoma's no-phone-home
|
||||
posture rules out hosting, but the *resume* and *portable export* parts
|
||||
fit:
|
||||
|
||||
- `internal/session` already persists sessions (`SessionStore`). Add a
|
||||
TUI session picker (`/sessions`) over the store + the project registry
|
||||
(`~/.config/gnoma/projects.json`, shipped in `56d7217`) for
|
||||
cross-project recency.
|
||||
- "Share" becomes **export to a self-contained transcript file**
|
||||
(markdown or JSON) the user can attach anywhere — explicitly local,
|
||||
documented in the Security section.
|
||||
|
||||
### 6. LSP-backed context (opencode parity, optional)
|
||||
|
||||
opencode feeds LSP diagnostics into context. This is the largest item
|
||||
and is **gated** — list it so the spec is complete, but scope it as a
|
||||
follow-up dependent on whether an LSP client lands in `internal/tool`.
|
||||
For now: acknowledge the gap, don't build it under this plan.
|
||||
|
||||
### 7. Layout helper (the one opentui concept)
|
||||
|
||||
`rendering.go` joins regions imperatively. Introduce a tiny
|
||||
`internal/tui/layout` helper expressing the chat / status / input /
|
||||
overlay regions declaratively (sizes, weights, ordering) so resize
|
||||
handling and overlay placement stop being ad-hoc. View logic computes a
|
||||
layout tree of *regions*; lipgloss styling stays in `theme.go`. This is
|
||||
the "core vs bindings" split, in Go, with zero new deps.
|
||||
|
||||
---
|
||||
|
||||
## Touch-points (file:line)
|
||||
|
||||
| Change | Location |
|
||||
|---|---|
|
||||
| Plan/build mode toggle + status indicator | `internal/tui/app.go:643-668`, `internal/tui/rendering.go:551-620`, `theme.go:164-171` |
|
||||
| Leader-key palette entry point | `internal/tui/app.go:339-366,585-598`, `completions.go:17-46`, picker render `rendering.go:126-148` |
|
||||
| External theme file loading | `internal/tui/theme.go:13-27,30-106,182-246`, config key `internal/config/config.go:434-437` |
|
||||
| Diff rendering for edits | `internal/tui/rendering.go:254-371`, edit-diff source `internal/tool/fs/edit.go:136-191` |
|
||||
| Session picker + transcript export | `internal/tui/app.go:1188-1500+` (new `/sessions`, `/export`), `internal/session` `SessionStore`, project registry |
|
||||
| Layout helper | new `internal/tui/layout/`, consumed by `rendering.go:21-64` |
|
||||
| New keybindings registry | `internal/tui/app.go:336-810` (centralise the literals), `[tui]` config |
|
||||
|
||||
---
|
||||
|
||||
## Testing (TDD — write first)
|
||||
|
||||
- **Theme loading:** a malformed user theme file is rejected with a
|
||||
clear error and falls back to the configured built-in (no panic).
|
||||
A valid user theme appears in the picker and `ApplyTheme` produces the
|
||||
expected styles.
|
||||
- **Diff rendering:** an `fs.edit` result renders as red/green hunks;
|
||||
a non-diff tool result is unaffected (golden-string test on the
|
||||
rendered output).
|
||||
- **Palette:** leader key opens the palette pre-filled with the same
|
||||
commands `completionSource` yields; selecting an item dispatches the
|
||||
identical `handleCommand` path as typing the slash command.
|
||||
- **Mode toggle:** the labelled toggle and Shift+Tab cycle stay in sync
|
||||
with `permission.Checker`'s mode; the status indicator color matches
|
||||
`ModeColor`.
|
||||
- **Session picker / export:** picker lists sessions from the store +
|
||||
registry ordered by recency; export produces a transcript that
|
||||
round-trips (re-import yields the same message list).
|
||||
- **Layout helper:** unit tests on region sizing across terminal widths
|
||||
(narrow / wide / resize) with no overlap and correct overlay placement.
|
||||
- **Render snapshots:** golden tests for `View()` at representative
|
||||
states (streaming, picker open, permission prompt) so refactors are
|
||||
caught.
|
||||
|
||||
### Acceptance criteria
|
||||
|
||||
1. `Ctrl+K` opens a command palette routing to the same actions as
|
||||
slash commands.
|
||||
2. A user theme file in `~/.config/gnoma/themes/` is selectable and
|
||||
applies; built-ins unchanged.
|
||||
3. File edits render as a colored unified diff in the chat.
|
||||
4. A plan/build mode indicator is visible in the status line; both the
|
||||
toggle and Shift+Tab drive `permission.Checker`.
|
||||
5. `/sessions` lists and resumes prior sessions across projects;
|
||||
`/export` writes a self-contained transcript.
|
||||
6. No new non-Go dependency; binary stays single-static.
|
||||
|
||||
---
|
||||
|
||||
## Open questions (resolve at implementation)
|
||||
|
||||
- **Leader key default** — `Ctrl+K` vs leaving it config-only to avoid
|
||||
clashing with existing bindings (`app.go:336-810`). Default `Ctrl+K`,
|
||||
configurable.
|
||||
- **Theme file format** — TOML (matches gnoma config) vs JSON (matches
|
||||
opencode themes, eases porting their palettes). Lean TOML; accept both.
|
||||
- **opencode-vs-opentui scope** — we deliberately take UX *patterns*
|
||||
from opencode and only two layout *concepts* from opentui. If a future
|
||||
`gnoma web` target lands, revisit whether the layout helper should
|
||||
generalise toward an opentui-style region tree.
|
||||
- **Diff renderer** — write a minimal in-house unified-diff colorizer vs
|
||||
pull a small Go diff-rendering lib. Prefer in-house (no dep, the edit
|
||||
tool already emits structured diffs).
|
||||
- **LSP context (item 6)** — out of scope here; gate on an
|
||||
`internal/tool` LSP client landing.
|
||||
|
||||
---
|
||||
|
||||
## TODO linkage
|
||||
|
||||
New "TUI/UX refresh — opencode-inspired patterns" entry in `TODO.md`
|
||||
(In flight) links here. Gap-closing against the existing
|
||||
`internal/tui/*`; opencode supplies the UX patterns, opentui supplies
|
||||
two layout concepts (re-implemented in Go, not imported).
|
||||
@@ -0,0 +1,113 @@
|
||||
# Implementation roadmap — 2026-06-04
|
||||
|
||||
Root sequencing spec for the in-flight work. Each tier is a self-contained
|
||||
merge unit; tiers may overlap when plans are written by separate elfs but
|
||||
the listed order is the *target* sequence.
|
||||
|
||||
Ties together the open items from [TODO.md §In flight](../../TODO.md)
|
||||
and the 2026-06-04 plans under `docs/superpowers/plans/`.
|
||||
|
||||
---
|
||||
|
||||
## Tier 1 — Small ships, low coupling (~1-2 weeks)
|
||||
|
||||
| # | Plan | Depends on | Surface |
|
||||
|---|---|---|---|
|
||||
| 1 | [2026-06-04-config-migration-followups.md](../plans/2026-06-04-config-migration-followups.md) | — | encoder fix (Duration pointer) |
|
||||
| 2 | [2026-06-04-minimax-provider.md](../plans/2026-06-04-minimax-provider.md) | — | `openaicompat` + metered billing slice |
|
||||
| 3 | [2026-06-04-models-dev-source-of-truth.md](../plans/2026-06-04-models-dev-source-of-truth.md) | — | embedded snapshot + read-side wiring |
|
||||
|
||||
All three are provider/router-adjacent and parallelize cleanly. None
|
||||
touch the engine loop. Each is a self-contained PR.
|
||||
|
||||
**Note on Tier 1 ordering vs. egress:** models.dev ships with the
|
||||
embedded-snapshot default (per its plan). The `models refresh` wire-fetch
|
||||
path is gated behind the Tier 3 egress work — that is **not** a hard
|
||||
dependency for the Tier 1 ship.
|
||||
|
||||
## Tier 2 — UX + integration polish (~2-3 weeks, parallelizable)
|
||||
|
||||
| # | Plan | Depends on | Surface |
|
||||
|---|---|---|---|
|
||||
| 4 | [2026-06-04-tui-ux-opencode.md](../plans/2026-06-04-tui-ux-opencode.md) | — | additive on `internal/tui/*` |
|
||||
| 5 | [2026-06-04-distribution-followups.md](../plans/2026-06-04-distribution-followups.md) | — | cosign, brew, dockers_v2 |
|
||||
|
||||
Pure polish. No engine change. Can run in parallel with Tier 1 and Tier 3.
|
||||
|
||||
## Tier 3 — Egress foundation (~2-3 weeks)
|
||||
|
||||
| # | Plan | Depends on | Surface |
|
||||
|---|---|---|---|
|
||||
| 6 | [2026-06-04-egress-allowlist.md](../plans/2026-06-04-egress-allowlist.md) | audit log (already shipped) | transport-layer Learn → Review → Enforce |
|
||||
|
||||
Blocks the wire-fetch path of models.dev refresh, future SDK egress
|
||||
controls, and any future "gnoma fetches at runtime" feature.
|
||||
|
||||
## Tier 4 — Cross-platform Phase 1 (~1 week)
|
||||
|
||||
| # | Plan | Depends on | Surface |
|
||||
|---|---|---|---|
|
||||
| 7 | [2026-06-04-cross-platform.md](../plans/2026-06-04-cross-platform.md) (Phase 1 only) | — | release-archive smoke matrix per platform |
|
||||
|
||||
Per the plan: Phase 1 is the precondition for an honest r/devops post.
|
||||
Phase 2 items land one-per-PR as r/devops questions surface.
|
||||
|
||||
**Promote to Tier 2 if r/devops is on the near-term calendar.**
|
||||
|
||||
## Tier 5 — New protocol / orchestration (~2-4 weeks each)
|
||||
|
||||
| # | Plan | Depends on | Surface |
|
||||
|---|---|---|---|
|
||||
| 8a | [2026-06-04-agent-client-protocol.md](../plans/2026-06-04-agent-client-protocol.md) (server side) | — | `gnoma acp` over stdio |
|
||||
| 8b | [2026-06-04-agent-client-protocol.md](../plans/2026-06-04-agent-client-protocol.md) (client side) | 8a | external ACP agents as router arms |
|
||||
| 9 | [2026-06-04-multi-agent-engineering-forge.md](../plans/2026-06-04-multi-agent-engineering-forge.md) | — | `internal/forge` state machine + `internal/sandbox` + 3 elfs |
|
||||
|
||||
ACP is split into two PRs (server-side, then client-side) — the
|
||||
server-side drives editors (Zed, Kiro, OpenCode), the client-side
|
||||
consumes external ACP agents as router arms. Same wire protocol, two
|
||||
roles, two PRs.
|
||||
|
||||
**Why ACP before MAEF:** MAEF has no hard dependency on ACP, but
|
||||
shipping ACP first means a future MAEF Critic can be an external ACP
|
||||
agent via `router.ForceArm` instead of being locked to a gnoma elf.
|
||||
**Flip to MAEF-first if MAEF is the next-release headline.**
|
||||
|
||||
## Tier 6 — Older open plans (May)
|
||||
|
||||
| Plan | Note |
|
||||
|---|---|
|
||||
| [2026-05-24-config-migration.md](../plans/2026-05-24-config-migration.md) | Phase 2+ (doctor already shipped in `f321dab`; project registry in `56d7217`). Follow-up plan is Tier 1 #1. |
|
||||
| [2026-05-24-sensitive-content-policy.md](../plans/2026-05-24-sensitive-content-policy.md) | Cross-cuts. Held until entropy-FP telemetry (Phase F-1) observed in production. |
|
||||
| [2026-05-25-encoder-bandit-router.md](../plans/2026-05-25-encoder-bandit-router.md) | Supersedes the open bandit-design question in TODO. Revisit when SLM dispatcher is in production. |
|
||||
| [2026-05-23-tool-router-specialization.md](../plans/2026-05-23-tool-router-specialization.md) | Telemetry-gated at 20% did-switch rate. May never ship. |
|
||||
|
||||
## Shipped (carried for history)
|
||||
|
||||
`2026-05-19-post-slm-unlock.md`, `2026-05-23-prefer-routing-policy.md`,
|
||||
`2026-05-23-routing-defaults-refresh.md`, `2026-05-23-startup-safety-banner.md`,
|
||||
`2026-05-19-security-wave1-safeprovider.md`, `2026-05-19-security-wave2-incognito.md`.
|
||||
|
||||
## Sequencing rationale (the 3 push-back points)
|
||||
|
||||
1. **models.dev before egress** — the plan is explicitly offline-first
|
||||
(embedded snapshot is default). Ship the read-side plumbing first so
|
||||
every later arm addition benefits from correct pricing/caps. Refresh
|
||||
is a Phase 2 follow-up gated on Tier 3.
|
||||
2. **ACP before MAEF** — see Tier 5 note. Future-proofs the MAEF Critic
|
||||
path. Flip if MAEF is the release headline.
|
||||
3. **TUI/UX before distribution** — these are parallelizable, so the
|
||||
order between them is "whichever PR is ready first."
|
||||
|
||||
## Decision points to revisit
|
||||
|
||||
| Question | Effect |
|
||||
|---|---|
|
||||
| Is r/devops on the near-term calendar? | Promote cross-platform Phase 1 to Tier 2. |
|
||||
| Is MAEF the next-release headline? | Flip Tier 5 to MAEF-then-ACP. |
|
||||
| Will the SLM be running in production soon? | Promote encoder-bandit router to active. |
|
||||
|
||||
## Open question for the maintainer
|
||||
|
||||
Should the `docs/superpowers/specs/` directory become the home for
|
||||
**sequencing / cross-cutting** docs (this roadmap, future triage notes)
|
||||
while `plans/` stays per-feature? Currently `specs/` is empty.
|
||||
+164
-86
@@ -3,27 +3,41 @@ package config
|
||||
import "time"
|
||||
|
||||
// Config is the top-level configuration.
|
||||
//
|
||||
// Fields tagged with `,omitempty` are skipped by the encoder at
|
||||
// their Go zero value, which is what stops `gnoma config set` from
|
||||
// re-emitting zero-spam in fields the user never set. Fields where
|
||||
// the zero value can be a legitimate user choice (numeric / bool
|
||||
// where 0 / false is meaningful) are pointer types so nil (absent)
|
||||
// and *zero (explicit) are distinguishable at resolve time — see
|
||||
// Resolved() and ResolvedConfig in resolve.go.
|
||||
type Config struct {
|
||||
// DefaultProfile names the profile loaded when no --profile flag is
|
||||
// passed. Only meaningful when ~/.config/gnoma/profiles/ exists; see
|
||||
// LoadWithProfile.
|
||||
DefaultProfile string `toml:"default_profile"`
|
||||
DefaultProfile string `toml:"default_profile,omitempty"`
|
||||
|
||||
Provider ProviderSection `toml:"provider"`
|
||||
Permission PermissionSection `toml:"permission"`
|
||||
Tools ToolsSection `toml:"tools"`
|
||||
RateLimits RateLimitSection `toml:"rate_limits"`
|
||||
Security SecuritySection `toml:"security"`
|
||||
Session SessionSection `toml:"session"`
|
||||
SLM SLMSection `toml:"slm"`
|
||||
Router RouterSection `toml:"router"`
|
||||
Safety SafetySection `toml:"safety"`
|
||||
CLIAgents CLIAgentsSection `toml:"cli_agents"`
|
||||
Arms []ArmConfig `toml:"arms"`
|
||||
Hooks []HookConfig `toml:"hooks"`
|
||||
MCPServers []MCPServerConfig `toml:"mcp_servers"`
|
||||
Plugins PluginsSection `toml:"plugins"`
|
||||
TUI TUISection `toml:"tui"`
|
||||
// Settings holds gnoma-level options that aren't tied to a
|
||||
// specific section (provider, tools, etc.). Currently just the
|
||||
// project-registry toggle; future home for log level, telemetry
|
||||
// flags, etc.
|
||||
Settings SettingsSection `toml:"config,omitempty"`
|
||||
|
||||
Provider ProviderSection `toml:"provider,omitempty"`
|
||||
Permission PermissionSection `toml:"permission,omitempty"`
|
||||
Tools ToolsSection `toml:"tools,omitempty"`
|
||||
RateLimits RateLimitSection `toml:"rate_limits,omitempty"`
|
||||
Security SecuritySection `toml:"security,omitempty"`
|
||||
Session SessionSection `toml:"session,omitempty"`
|
||||
SLM SLMSection `toml:"slm,omitempty"`
|
||||
Router RouterSection `toml:"router,omitempty"`
|
||||
Safety SafetySection `toml:"safety,omitempty"`
|
||||
CLIAgents CLIAgentsSection `toml:"cli_agents,omitempty"`
|
||||
Arms []ArmConfig `toml:"arms,omitempty"`
|
||||
Hooks []HookConfig `toml:"hooks,omitempty"`
|
||||
MCPServers []MCPServerConfig `toml:"mcp_servers,omitempty"`
|
||||
Plugins PluginsSection `toml:"plugins,omitempty"`
|
||||
TUI TUISection `toml:"tui,omitempty"`
|
||||
}
|
||||
|
||||
// SLMSection configures the optional small language model used for task
|
||||
@@ -40,14 +54,36 @@ type Config struct {
|
||||
//
|
||||
// See docs/slm-backends.md for copy-paste presets.
|
||||
type SLMSection struct {
|
||||
Enabled bool `toml:"enabled"`
|
||||
Backend string `toml:"backend"` // auto | ollama | llamacpp | llamafile | openaicompat | disabled (empty = auto)
|
||||
Model string `toml:"model"` // model name (ollama/llamacpp/openaicompat); ignored for llamafile
|
||||
BaseURL string `toml:"base_url"` // server URL; defaults per-backend
|
||||
ModelURL string `toml:"model_url"` // llamafile-only: where to download the binary from
|
||||
DataDir string `toml:"data_dir"` // llamafile-only: where to put it (empty = XDG default)
|
||||
ExpectedSHA256 string `toml:"expected_sha256"` // llamafile-only: verify hash if non-empty
|
||||
StartupTimeout Duration `toml:"startup_timeout"` // llamafile-only: first-launch wait budget; 0 = default 5s
|
||||
Enabled bool `toml:"enabled,omitempty"`
|
||||
Backend string `toml:"backend,omitempty"` // auto | ollama | llamacpp | llamafile | openaicompat | disabled (empty = auto)
|
||||
Model string `toml:"model,omitempty"` // model name (ollama/llamacpp/openaicompat); ignored for llamafile
|
||||
BaseURL string `toml:"base_url,omitempty"` // server URL; defaults per-backend
|
||||
ModelURL string `toml:"model_url,omitempty"` // llamafile-only: where to download the binary from
|
||||
DataDir string `toml:"data_dir,omitempty"` // llamafile-only: where to put it (empty = XDG default)
|
||||
ExpectedSHA256 string `toml:"expected_sha256,omitempty"` // llamafile-only: verify hash if non-empty
|
||||
StartupTimeout *Duration `toml:"startup_timeout,omitempty"` // llamafile-only: first-launch wait budget; nil = default 5s
|
||||
|
||||
// ClassifyTimeout caps each task-classification call to the SLM.
|
||||
// nil here means "use the built-in default" (15s). *Duration(0) is
|
||||
// explicit-zero and also resolves to 0 (the SLM layer treats 0
|
||||
// the same as nil via internal/slm/classifier.go). Pointer
|
||||
// conversion was added in the 2026-06-04 follow-up so the encoder
|
||||
// can honor omitempty — see plan file referenced in resolve.go.
|
||||
ClassifyTimeout *Duration `toml:"classify_timeout,omitempty"`
|
||||
|
||||
// RegisterAsArm controls whether the SLM model is registered as
|
||||
// a tier-0 execution arm in addition to its classifier role.
|
||||
// nil (absent) → true (preserve historical behaviour: SLM is
|
||||
// both classifier and an execution arm for trivial-complexity
|
||||
// prompts). Explicitly false → SLM is classifier-only; trivial
|
||||
// prompts route to other local arms instead.
|
||||
//
|
||||
// Set this to false when the SLM model is task-specialised
|
||||
// (FunctionGemma, embedding-only models, code-completion-tuned
|
||||
// models) and would produce wrong-shape output if asked to
|
||||
// answer a general prompt. Pointer type so the absent-value
|
||||
// case can be distinguished from explicit false.
|
||||
RegisterAsArm *bool `toml:"register_as_arm,omitempty"`
|
||||
}
|
||||
|
||||
// ArmConfig tunes routing for a single registered arm. Multiple [[arms]]
|
||||
@@ -69,9 +105,9 @@ type SLMSection struct {
|
||||
// Strength names map to router.TaskType via router.ParseTaskType — same
|
||||
// names the SLM classifier emits (snake_case or no separator both work).
|
||||
type ArmConfig struct {
|
||||
ID string `toml:"id"`
|
||||
Strengths []string `toml:"strengths"`
|
||||
CostWeight float64 `toml:"cost_weight"`
|
||||
ID string `toml:"id,omitempty"`
|
||||
Strengths []string `toml:"strengths,omitempty"`
|
||||
CostWeight float64 `toml:"cost_weight,omitempty"`
|
||||
}
|
||||
|
||||
// CLIAgentsSection maps canonical CLI agent names to override binary names.
|
||||
@@ -103,15 +139,15 @@ type SafetySection struct {
|
||||
// RefuseInSystemDirs gates the refuse path. When false, system
|
||||
// roots like / and /etc are treated as warn-tier instead of refuse.
|
||||
// Default: true.
|
||||
RefuseInSystemDirs *bool `toml:"refuse_in_system_dirs"`
|
||||
RefuseInSystemDirs *bool `toml:"refuse_in_system_dirs,omitempty"`
|
||||
// WarnInHome gates the warn-tier check for $HOME and common
|
||||
// dumping grounds (~/Desktop, ~/Downloads, /tmp). When false,
|
||||
// these all become OK-tier (banner still shown). Default: true.
|
||||
WarnInHome *bool `toml:"warn_in_home"`
|
||||
WarnInHome *bool `toml:"warn_in_home,omitempty"`
|
||||
// RequireProjectMarker, when true, treats any directory without
|
||||
// a recognized project marker as warn-tier (even inside a git
|
||||
// repo). Default: false — git repo is enough by default.
|
||||
RequireProjectMarker bool `toml:"require_project_marker"`
|
||||
RequireProjectMarker bool `toml:"require_project_marker,omitempty"`
|
||||
}
|
||||
|
||||
// ResolvedSafety returns the effective Safety settings with defaults
|
||||
@@ -148,7 +184,11 @@ type RouterSection struct {
|
||||
// arm context window. Useful for debugging or for forcing the behavior
|
||||
// on a large local model. Defaults to false: two-stage activates
|
||||
// automatically on local arms with context window <= 16k.
|
||||
ForceTwoStage bool `toml:"force_two_stage"`
|
||||
//
|
||||
// Pointer so the absent-vs-explicit-false distinction is preserved
|
||||
// across write/read cycles; the resolver substitutes the default
|
||||
// (false) for nil. See ResolvedRouterSection in resolve.go.
|
||||
ForceTwoStage *bool `toml:"force_two_stage,omitempty"`
|
||||
|
||||
// Prefer biases routing toward local arms ("local"), cloud arms
|
||||
// ("cloud"), or leaves the tier-based selection unchanged ("auto").
|
||||
@@ -156,12 +196,12 @@ type RouterSection struct {
|
||||
// not hard-filter the dispreferred set. Forced arms (--provider X)
|
||||
// and incognito take priority over this knob. See
|
||||
// docs/superpowers/plans/2026-05-23-prefer-routing-policy.md.
|
||||
Prefer string `toml:"prefer"`
|
||||
Prefer string `toml:"prefer,omitempty"`
|
||||
|
||||
// Bandit exposes the selector's tuning knobs. Defaults preserve
|
||||
// previous hard-coded behaviour exactly; only set these when you
|
||||
// need to tune the EMA quality tracker for an unusual workload.
|
||||
Bandit BanditSection `toml:"bandit"`
|
||||
Bandit BanditSection `toml:"bandit,omitempty"`
|
||||
}
|
||||
|
||||
// BanditSection holds the scoring knobs for the EMA quality tracker
|
||||
@@ -174,23 +214,44 @@ type BanditSection struct {
|
||||
// QualityAlpha is the EMA smoothing factor for arm-quality
|
||||
// observations. Larger values weight recent observations more.
|
||||
// Default: 0.3 (~3-sample memory). 0.0 here means "use default".
|
||||
QualityAlpha float64 `toml:"quality_alpha"`
|
||||
QualityAlpha float64 `toml:"quality_alpha,omitempty"`
|
||||
|
||||
// MinObservations is the minimum number of samples required
|
||||
// before observed EMA overrides the heuristic fallback. Default:
|
||||
// 3. 0 here means "use default".
|
||||
MinObservations int `toml:"min_observations"`
|
||||
MinObservations int `toml:"min_observations,omitempty"`
|
||||
|
||||
// ObservedWeight is the weight of the observed EMA in the
|
||||
// observed/heuristic blend inside scoreArm: the final quality is
|
||||
// `observed*W + heuristic*(1-W)`. Default: 0.7. 0.0 here means
|
||||
// "use default".
|
||||
ObservedWeight float64 `toml:"observed_weight"`
|
||||
ObservedWeight float64 `toml:"observed_weight,omitempty"`
|
||||
|
||||
// StrengthBonus is the quality bonus added when an arm declares
|
||||
// the current task type in its Strengths list. Default: 0.15.
|
||||
// 0.0 here means "use default".
|
||||
StrengthBonus float64 `toml:"strength_bonus"`
|
||||
StrengthBonus float64 `toml:"strength_bonus,omitempty"`
|
||||
}
|
||||
|
||||
// SettingsSection holds gnoma-level options that aren't tied to
|
||||
// a specific functional section (provider, tools, etc.). Lives
|
||||
// under `[config]` in the user's TOML file. Current fields:
|
||||
//
|
||||
// - ProjectRegistry: opt out of the ~/.config/gnoma/projects.json
|
||||
// write. nil = enabled (default true; preserves v0.3.x
|
||||
// behavior of always recording); *false = opt out.
|
||||
//
|
||||
// The file itself is purely local — never sent off-machine —
|
||||
// see README §Security. The toggle exists for users who don't
|
||||
// want the directory log kept at all.
|
||||
type SettingsSection struct {
|
||||
// ProjectRegistry controls whether gnoma writes to
|
||||
// ~/.config/gnoma/projects.json (the per-user list of
|
||||
// directories gnoma has been launched in, used by
|
||||
// `gnoma doctor --all-projects`, `gnoma upgrade-config --all`,
|
||||
// and the cross-project session picker). nil = enabled
|
||||
// (default true); *false = opt out.
|
||||
ProjectRegistry *bool `toml:"project_registry,omitempty"`
|
||||
}
|
||||
|
||||
// MCPServerConfig defines an MCP server to start and connect to.
|
||||
@@ -205,17 +266,17 @@ type BanditSection struct {
|
||||
// timeout = "30s"
|
||||
// replace_default = { exec = "bash" } # MCP tool "exec" replaces built-in "bash"
|
||||
type MCPServerConfig struct {
|
||||
Name string `toml:"name"`
|
||||
Command string `toml:"command"`
|
||||
Args []string `toml:"args"`
|
||||
Env map[string]string `toml:"env"`
|
||||
Timeout string `toml:"timeout"`
|
||||
ReplaceDefault map[string]string `toml:"replace_default"` // MCP tool name → built-in name
|
||||
ToolPolicy map[string]MCPToolPolicy `toml:"tool_policy"` // MCP tool name → policy
|
||||
Name string `toml:"name,omitempty"`
|
||||
Command string `toml:"command,omitempty"`
|
||||
Args []string `toml:"args,omitempty"`
|
||||
Env map[string]string `toml:"env,omitempty"`
|
||||
Timeout string `toml:"timeout,omitempty"`
|
||||
ReplaceDefault map[string]string `toml:"replace_default,omitempty"` // MCP tool name → built-in name
|
||||
ToolPolicy map[string]MCPToolPolicy `toml:"tool_policy,omitempty"` // MCP tool name → policy
|
||||
}
|
||||
|
||||
type MCPToolPolicy struct {
|
||||
PathArgs []string `toml:"path_args"`
|
||||
PathArgs []string `toml:"path_args,omitempty"`
|
||||
}
|
||||
|
||||
// PluginsSection controls plugin loading.
|
||||
@@ -226,8 +287,8 @@ type MCPToolPolicy struct {
|
||||
// enabled = ["git-tools", "docker-tools"]
|
||||
// disabled = ["experimental-plugin"]
|
||||
type PluginsSection struct {
|
||||
Enabled []string `toml:"enabled"`
|
||||
Disabled []string `toml:"disabled"`
|
||||
Enabled []string `toml:"enabled,omitempty"`
|
||||
Disabled []string `toml:"disabled,omitempty"`
|
||||
}
|
||||
|
||||
// HookConfig is a single hook entry from TOML config.
|
||||
@@ -243,17 +304,22 @@ type PluginsSection struct {
|
||||
// timeout = "10s"
|
||||
// fail_open = false
|
||||
type HookConfig struct {
|
||||
Name string `toml:"name"`
|
||||
Event string `toml:"event"`
|
||||
Type string `toml:"type"`
|
||||
Exec string `toml:"exec"`
|
||||
Timeout string `toml:"timeout"`
|
||||
FailOpen bool `toml:"fail_open"`
|
||||
ToolPattern string `toml:"tool_pattern"`
|
||||
Name string `toml:"name,omitempty"`
|
||||
Event string `toml:"event,omitempty"`
|
||||
Type string `toml:"type,omitempty"`
|
||||
Exec string `toml:"exec,omitempty"`
|
||||
Timeout string `toml:"timeout,omitempty"`
|
||||
FailOpen *bool `toml:"fail_open,omitempty"`
|
||||
ToolPattern string `toml:"tool_pattern,omitempty"`
|
||||
}
|
||||
|
||||
type SessionSection struct {
|
||||
MaxKeep int `toml:"max_keep"`
|
||||
// MaxKeep is the maximum number of sessions to retain. nil = use
|
||||
// default (20); *0 = explicitly disable session retention.
|
||||
// Pointer type so the absent-vs-explicit-zero distinction is
|
||||
// preserved across write/read cycles; the resolver substitutes
|
||||
// the default for nil. See ResolvedSessionSection in resolve.go.
|
||||
MaxKeep *int `toml:"max_keep,omitempty"`
|
||||
}
|
||||
|
||||
// SecuritySection configures the secret scanner and firewall.
|
||||
@@ -272,41 +338,53 @@ type SessionSection struct {
|
||||
// entropy_safelist names known-safe shapes that bypass the entropy scorer
|
||||
// (Phase F-1 FP reduction). Empty / unset preserves pre-F-1 behavior.
|
||||
type SecuritySection struct {
|
||||
EntropyThreshold float64 `toml:"entropy_threshold"`
|
||||
RedactHighEntropy bool `toml:"redact_high_entropy"`
|
||||
EntropySafelist []string `toml:"entropy_safelist"`
|
||||
Patterns []PatternConfig `toml:"patterns"`
|
||||
// EntropyThreshold is the Shannon-entropy floor above which a
|
||||
// token is treated as a possible secret. nil = use the built-in
|
||||
// default (4.5); *0 disables the entropy pre-filter entirely.
|
||||
// Pointer type so the absent-vs-explicit-zero distinction is
|
||||
// preserved across write/read cycles; the resolver substitutes
|
||||
// the default for nil. See ResolvedSecuritySection in resolve.go.
|
||||
EntropyThreshold *float64 `toml:"entropy_threshold,omitempty"`
|
||||
|
||||
// RedactHighEntropy controls whether high-entropy hits are
|
||||
// redacted in outgoing LLM traffic. nil = false (warn / block
|
||||
// only); *true enables redaction. Pointer type so the absent-
|
||||
// vs-explicit-false distinction is preserved.
|
||||
RedactHighEntropy *bool `toml:"redact_high_entropy,omitempty"`
|
||||
|
||||
EntropySafelist []string `toml:"entropy_safelist,omitempty"`
|
||||
Patterns []PatternConfig `toml:"patterns,omitempty"`
|
||||
}
|
||||
|
||||
type PatternConfig struct {
|
||||
Name string `toml:"name"`
|
||||
Regex string `toml:"regex"`
|
||||
Action string `toml:"action"` // "redact" (default), "block", "warn"
|
||||
Name string `toml:"name,omitempty"`
|
||||
Regex string `toml:"regex,omitempty"`
|
||||
Action string `toml:"action,omitempty"` // "redact" (default), "block", "warn"
|
||||
}
|
||||
|
||||
type PermissionSection struct {
|
||||
Mode string `toml:"mode"`
|
||||
Rules []PermissionRule `toml:"rules"`
|
||||
Mode string `toml:"mode,omitempty"`
|
||||
Rules []PermissionRule `toml:"rules,omitempty"`
|
||||
}
|
||||
|
||||
type PermissionRule struct {
|
||||
Tool string `toml:"tool"`
|
||||
Pattern string `toml:"pattern"`
|
||||
Action string `toml:"action"`
|
||||
Tool string `toml:"tool,omitempty"`
|
||||
Pattern string `toml:"pattern,omitempty"`
|
||||
Action string `toml:"action,omitempty"`
|
||||
}
|
||||
|
||||
type ProviderSection struct {
|
||||
Default string `toml:"default"`
|
||||
Model string `toml:"model"`
|
||||
MaxTokens int64 `toml:"max_tokens"`
|
||||
Temperature *float64 `toml:"temperature"`
|
||||
APIKeys map[string]string `toml:"api_keys"`
|
||||
Endpoints map[string]string `toml:"endpoints"`
|
||||
Default string `toml:"default,omitempty"`
|
||||
Model string `toml:"model,omitempty"`
|
||||
MaxTokens *int64 `toml:"max_tokens,omitempty"`
|
||||
Temperature *float64 `toml:"temperature,omitempty"`
|
||||
APIKeys map[string]string `toml:"api_keys,omitempty"`
|
||||
Endpoints map[string]string `toml:"endpoints,omitempty"`
|
||||
}
|
||||
|
||||
type ToolsSection struct {
|
||||
BashTimeout Duration `toml:"bash_timeout"`
|
||||
MaxFileSize int64 `toml:"max_file_size"`
|
||||
BashTimeout Duration `toml:"bash_timeout,omitempty"`
|
||||
MaxFileSize *int64 `toml:"max_file_size,omitempty"`
|
||||
}
|
||||
|
||||
// RateLimitSection allows overriding default rate limits per provider.
|
||||
@@ -326,15 +404,15 @@ type ToolsSection struct {
|
||||
type RateLimitSection map[string]RateLimitOverride
|
||||
|
||||
type RateLimitOverride struct {
|
||||
Tier string `toml:"tier"`
|
||||
RPS float64 `toml:"rps"`
|
||||
RPM int `toml:"rpm"`
|
||||
RPD int `toml:"rpd"`
|
||||
TPM int `toml:"tpm"`
|
||||
ITPM int `toml:"itpm"`
|
||||
OTPM int `toml:"otpm"`
|
||||
TokensMonth int64 `toml:"tokens_month"`
|
||||
SpendCap float64 `toml:"spend_cap"`
|
||||
Tier string `toml:"tier,omitempty"`
|
||||
RPS float64 `toml:"rps,omitempty"`
|
||||
RPM int `toml:"rpm,omitempty"`
|
||||
RPD int `toml:"rpd,omitempty"`
|
||||
TPM int `toml:"tpm,omitempty"`
|
||||
ITPM int `toml:"itpm,omitempty"`
|
||||
OTPM int `toml:"otpm,omitempty"`
|
||||
TokensMonth int64 `toml:"tokens_month,omitempty"`
|
||||
SpendCap float64 `toml:"spend_cap,omitempty"`
|
||||
}
|
||||
|
||||
// Duration wraps time.Duration for TOML string parsing (e.g. "30s", "5m").
|
||||
@@ -354,6 +432,6 @@ func (d Duration) Duration() time.Duration {
|
||||
}
|
||||
|
||||
type TUISection struct {
|
||||
Theme string `toml:"theme"`
|
||||
Vim bool `toml:"vim"`
|
||||
Theme string `toml:"theme,omitempty"`
|
||||
Vim bool `toml:"vim,omitempty"`
|
||||
}
|
||||
|
||||
@@ -5,6 +5,8 @@ import (
|
||||
"path/filepath"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"github.com/BurntSushi/toml"
|
||||
)
|
||||
|
||||
func TestDefaults(t *testing.T) {
|
||||
@@ -12,8 +14,8 @@ func TestDefaults(t *testing.T) {
|
||||
if cfg.Provider.Default != "" {
|
||||
t.Errorf("Provider.Default = %q, want empty (no default provider)", cfg.Provider.Default)
|
||||
}
|
||||
if cfg.Provider.MaxTokens != 8192 {
|
||||
t.Errorf("Provider.MaxTokens = %d", cfg.Provider.MaxTokens)
|
||||
if cfg.Provider.MaxTokens == nil || *cfg.Provider.MaxTokens != 8192 {
|
||||
t.Errorf("Provider.MaxTokens = %v, want *8192", cfg.Provider.MaxTokens)
|
||||
}
|
||||
if cfg.Tools.BashTimeout.Duration() != 30*time.Second {
|
||||
t.Errorf("Tools.BashTimeout = %v", cfg.Tools.BashTimeout)
|
||||
@@ -53,8 +55,8 @@ max_file_size = 2097152
|
||||
if cfg.Provider.Model != "claude-sonnet-4" {
|
||||
t.Errorf("Provider.Model = %q", cfg.Provider.Model)
|
||||
}
|
||||
if cfg.Provider.MaxTokens != 16384 {
|
||||
t.Errorf("Provider.MaxTokens = %d", cfg.Provider.MaxTokens)
|
||||
if cfg.Provider.MaxTokens == nil || *cfg.Provider.MaxTokens != 16384 {
|
||||
t.Errorf("Provider.MaxTokens = %v, want *16384", cfg.Provider.MaxTokens)
|
||||
}
|
||||
if cfg.Provider.APIKeys["anthropic"] != "sk-test-123" {
|
||||
t.Errorf("APIKeys[anthropic] = %q", cfg.Provider.APIKeys["anthropic"])
|
||||
@@ -65,8 +67,8 @@ max_file_size = 2097152
|
||||
if cfg.Tools.BashTimeout.Duration() != 60*time.Second {
|
||||
t.Errorf("Tools.BashTimeout = %v", cfg.Tools.BashTimeout)
|
||||
}
|
||||
if cfg.Tools.MaxFileSize != 2097152 {
|
||||
t.Errorf("Tools.MaxFileSize = %d", cfg.Tools.MaxFileSize)
|
||||
if cfg.Tools.MaxFileSize == nil || *cfg.Tools.MaxFileSize != 2097152 {
|
||||
t.Errorf("Tools.MaxFileSize = %v, want *2097152", cfg.Tools.MaxFileSize)
|
||||
}
|
||||
}
|
||||
|
||||
@@ -217,7 +219,7 @@ tool_pattern = "bash*"
|
||||
if h.Timeout != "5s" {
|
||||
t.Errorf("Timeout = %q", h.Timeout)
|
||||
}
|
||||
if !h.FailOpen {
|
||||
if h.FailOpen == nil || !*h.FailOpen {
|
||||
t.Error("FailOpen should be true")
|
||||
}
|
||||
if h.ToolPattern != "bash*" {
|
||||
@@ -444,7 +446,54 @@ model = "claude-haiku"
|
||||
t.Errorf("Model = %q, want claude-haiku (from project)", cfg.Provider.Model)
|
||||
}
|
||||
// Global: max_tokens = 4096
|
||||
if cfg.Provider.MaxTokens != 4096 {
|
||||
t.Errorf("MaxTokens = %d, want 4096 (from global)", cfg.Provider.MaxTokens)
|
||||
if cfg.Provider.MaxTokens == nil || *cfg.Provider.MaxTokens != 4096 {
|
||||
t.Errorf("MaxTokens = %v, want *4096 (from global)", cfg.Provider.MaxTokens)
|
||||
}
|
||||
}
|
||||
|
||||
func TestSLMSection_RegisterAsArm_AbsentDefaultsToTrue(t *testing.T) {
|
||||
// Absent field → nil pointer → caller treats as default true,
|
||||
// preserving pre-config behaviour where the SLM is always
|
||||
// registered as an execution arm.
|
||||
var cfg Config
|
||||
if _, err := toml.Decode(`[slm]
|
||||
enabled = true
|
||||
`, &cfg); err != nil {
|
||||
t.Fatalf("decode: %v", err)
|
||||
}
|
||||
if cfg.SLM.RegisterAsArm != nil {
|
||||
t.Errorf("expected nil pointer for absent register_as_arm, got %v", *cfg.SLM.RegisterAsArm)
|
||||
}
|
||||
}
|
||||
|
||||
func TestSLMSection_RegisterAsArm_ExplicitFalse(t *testing.T) {
|
||||
var cfg Config
|
||||
if _, err := toml.Decode(`[slm]
|
||||
enabled = true
|
||||
register_as_arm = false
|
||||
`, &cfg); err != nil {
|
||||
t.Fatalf("decode: %v", err)
|
||||
}
|
||||
if cfg.SLM.RegisterAsArm == nil {
|
||||
t.Fatal("expected non-nil pointer when register_as_arm is set")
|
||||
}
|
||||
if *cfg.SLM.RegisterAsArm {
|
||||
t.Errorf("expected register_as_arm=false to decode as *false, got *true")
|
||||
}
|
||||
}
|
||||
|
||||
func TestSLMSection_RegisterAsArm_ExplicitTrue(t *testing.T) {
|
||||
var cfg Config
|
||||
if _, err := toml.Decode(`[slm]
|
||||
enabled = true
|
||||
register_as_arm = true
|
||||
`, &cfg); err != nil {
|
||||
t.Fatalf("decode: %v", err)
|
||||
}
|
||||
if cfg.SLM.RegisterAsArm == nil {
|
||||
t.Fatal("expected non-nil pointer when register_as_arm is set")
|
||||
}
|
||||
if !*cfg.SLM.RegisterAsArm {
|
||||
t.Errorf("expected register_as_arm=true to decode as *true, got *false")
|
||||
}
|
||||
}
|
||||
|
||||
@@ -3,11 +3,24 @@ package config
|
||||
import "time"
|
||||
|
||||
func Defaults() Config {
|
||||
maxTokens := int64(8192)
|
||||
maxFileSize := int64(1 << 20) // 1MB
|
||||
maxKeep := 20
|
||||
entropyThreshold := 4.5
|
||||
redactHighEntropy := false
|
||||
forceTwoStage := false
|
||||
startupTimeout := Duration(5 * time.Second)
|
||||
classifyTimeout := Duration(0) // 0 = let the SLM layer pick its own 15s default
|
||||
projectRegistry := true
|
||||
|
||||
return Config{
|
||||
Settings: SettingsSection{
|
||||
ProjectRegistry: &projectRegistry,
|
||||
},
|
||||
Provider: ProviderSection{
|
||||
Default: "",
|
||||
Model: "",
|
||||
MaxTokens: 8192,
|
||||
MaxTokens: &maxTokens,
|
||||
APIKeys: make(map[string]string),
|
||||
Endpoints: make(map[string]string),
|
||||
},
|
||||
@@ -16,11 +29,19 @@ func Defaults() Config {
|
||||
},
|
||||
Tools: ToolsSection{
|
||||
BashTimeout: Duration(30 * time.Second),
|
||||
MaxFileSize: 1 << 20, // 1MB
|
||||
MaxFileSize: &maxFileSize,
|
||||
},
|
||||
Session: SessionSection{MaxKeep: &maxKeep},
|
||||
Security: SecuritySection{
|
||||
EntropyThreshold: &entropyThreshold,
|
||||
RedactHighEntropy: &redactHighEntropy,
|
||||
},
|
||||
Router: RouterSection{
|
||||
ForceTwoStage: &forceTwoStage,
|
||||
},
|
||||
Session: SessionSection{MaxKeep: 20},
|
||||
SLM: SLMSection{
|
||||
StartupTimeout: Duration(5 * time.Second),
|
||||
StartupTimeout: &startupTimeout,
|
||||
ClassifyTimeout: &classifyTimeout,
|
||||
},
|
||||
TUI: TUISection{
|
||||
Theme: "catppuccin",
|
||||
|
||||
@@ -0,0 +1,431 @@
|
||||
package config
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"os"
|
||||
"sort"
|
||||
"strings"
|
||||
|
||||
"github.com/BurntSushi/toml"
|
||||
)
|
||||
|
||||
// Severity ranks diagnostic findings for the CLI output and
|
||||
// exit-code decision. Higher numeric value = more severe.
|
||||
type Severity int
|
||||
|
||||
const (
|
||||
// SeverityInfo is a neutral observation (e.g. "field is at
|
||||
// the default value, can be removed"). Never causes a
|
||||
// non-zero exit on its own.
|
||||
SeverityInfo Severity = iota
|
||||
|
||||
// SeverityWarn indicates a likely problem the user should
|
||||
// review (e.g. an invalid enum value, an explicit-zero
|
||||
// pointer field that diverges from the default). Causes
|
||||
// a non-zero exit in CLI mode by default.
|
||||
SeverityWarn
|
||||
|
||||
// SeverityError indicates a hard failure (file unreadable,
|
||||
// file unparseable). Causes a non-zero exit.
|
||||
SeverityError
|
||||
)
|
||||
|
||||
// String returns the lower-case name of the severity for
|
||||
// human-readable output.
|
||||
func (s Severity) String() string {
|
||||
switch s {
|
||||
case SeverityInfo:
|
||||
return "info"
|
||||
case SeverityWarn:
|
||||
return "warn"
|
||||
case SeverityError:
|
||||
return "error"
|
||||
default:
|
||||
return "?"
|
||||
}
|
||||
}
|
||||
|
||||
// MarshalJSON encodes Severity as its lower-case name string
|
||||
// (e.g. "warn", "error") for stable CI/script consumption.
|
||||
// The default Go marshaling would emit the int value, which
|
||||
// is opaque to consumers.
|
||||
func (s Severity) MarshalJSON() ([]byte, error) {
|
||||
return []byte(`"` + s.String() + `"`), nil
|
||||
}
|
||||
|
||||
// Finding is one diagnostic result. The CLI renders these
|
||||
// either as human-readable text or as JSON (--json flag).
|
||||
type Finding struct {
|
||||
Severity Severity `json:"severity"`
|
||||
Path string `json:"path"`
|
||||
Key string `json:"key,omitempty"`
|
||||
Message string `json:"message"`
|
||||
Suggestion string `json:"suggestion,omitempty"`
|
||||
}
|
||||
|
||||
// Doctor runs diagnostic checks on config files. Constructed
|
||||
// with NewDoctor; reusable across many files. Stateless after
|
||||
// construction — set Defaults to override the comparison
|
||||
// baseline (used in tests; production always uses Defaults()).
|
||||
type Doctor struct {
|
||||
// Defaults is the baseline for "is this field at the
|
||||
// default value" checks. If nil, Defaults() is used.
|
||||
Defaults *Config
|
||||
}
|
||||
|
||||
// NewDoctor returns a Doctor with the production defaults
|
||||
// baseline.
|
||||
func NewDoctor() *Doctor {
|
||||
return &Doctor{Defaults: nil}
|
||||
}
|
||||
|
||||
// DiagnoseFile runs the full diagnostic suite on a single
|
||||
// config file. The returned slice may be empty (file is
|
||||
// clean) or contain findings of any severity.
|
||||
func (d *Doctor) DiagnoseFile(path string) []Finding {
|
||||
data, err := os.ReadFile(path)
|
||||
if err != nil {
|
||||
return []Finding{{
|
||||
Severity: SeverityError,
|
||||
Path: path,
|
||||
Message: fmt.Sprintf("read: %v", err),
|
||||
}}
|
||||
}
|
||||
|
||||
var cfg Config
|
||||
meta, err := toml.Decode(string(data), &cfg)
|
||||
if err != nil {
|
||||
return []Finding{{
|
||||
Severity: SeverityError,
|
||||
Path: path,
|
||||
Message: fmt.Sprintf("parse: %v", err),
|
||||
}}
|
||||
}
|
||||
|
||||
defaults := d.Defaults
|
||||
if defaults == nil {
|
||||
def := Defaults()
|
||||
defaults = &def
|
||||
}
|
||||
|
||||
var findings []Finding
|
||||
findings = append(findings, d.detectUnknownKeys(path, meta)...)
|
||||
findings = append(findings, d.detectInvalidEnums(path, &cfg)...)
|
||||
findings = append(findings, d.detectExplicitZeros(path, &cfg, defaults)...)
|
||||
return findings
|
||||
}
|
||||
|
||||
// DiagnoseFiles runs DiagnoseFile on each path in turn and
|
||||
// returns the concatenated findings. The order is the input
|
||||
// order; callers that want deterministic output should sort
|
||||
// their input list first.
|
||||
func (d *Doctor) DiagnoseFiles(paths []string) []Finding {
|
||||
var findings []Finding
|
||||
for _, p := range paths {
|
||||
findings = append(findings, d.DiagnoseFile(p)...)
|
||||
}
|
||||
// Stable order for diff-friendly CI output.
|
||||
sort.SliceStable(findings, func(i, j int) bool {
|
||||
if findings[i].Path != findings[j].Path {
|
||||
return findings[i].Path < findings[j].Path
|
||||
}
|
||||
if findings[i].Severity != findings[j].Severity {
|
||||
return findings[i].Severity > findings[j].Severity
|
||||
}
|
||||
return findings[i].Key < findings[j].Key
|
||||
})
|
||||
return findings
|
||||
}
|
||||
|
||||
// DiagnoseLayering compares the resolved views of two config
|
||||
// files (typically the global config and a project config)
|
||||
// and surfaces "shadowing" findings: cases where the project
|
||||
// file's value differs from the global's, and the project's
|
||||
// value is at the Go zero (string `""`, int 0, bool false).
|
||||
//
|
||||
// The original 2026-05-24 silent-corruption bug was exactly
|
||||
// this pattern: the project file had `[router] prefer = ""`,
|
||||
// silently shadowing the global's `prefer = "cloud"` because
|
||||
// TOML's "present field wins" semantics treat `""` as a
|
||||
// legitimate value rather than "absent". The doctor catches
|
||||
// it without needing the user to read the merge logic.
|
||||
//
|
||||
// Returns an empty slice if either file is missing (the
|
||||
// per-file `DiagnoseFile` already reports missing files; a
|
||||
// layering check without both sides has nothing to compare).
|
||||
func (d *Doctor) DiagnoseLayering(globalPath, projectPath string) []Finding {
|
||||
if _, err := os.Stat(globalPath); os.IsNotExist(err) {
|
||||
return nil
|
||||
}
|
||||
if _, err := os.Stat(projectPath); os.IsNotExist(err) {
|
||||
return nil
|
||||
}
|
||||
|
||||
var globalCfg, projectCfg Config
|
||||
if _, err := toml.DecodeFile(globalPath, &globalCfg); err != nil {
|
||||
return nil
|
||||
}
|
||||
if _, err := toml.DecodeFile(projectPath, &projectCfg); err != nil {
|
||||
return nil
|
||||
}
|
||||
|
||||
// For non-pointer string fields we need to know whether
|
||||
// the key was actually present in the project's source —
|
||||
// an absent key and a present-empty key look identical in
|
||||
// the typed Config. Parse the project to a raw map for
|
||||
// per-key presence checks.
|
||||
var projectRaw map[string]any
|
||||
if _, err := toml.DecodeFile(projectPath, &projectRaw); err != nil {
|
||||
projectRaw = nil
|
||||
}
|
||||
hasKey := func(section, key string) bool {
|
||||
if projectRaw == nil {
|
||||
return false
|
||||
}
|
||||
sec, ok := projectRaw[section].(map[string]any)
|
||||
if !ok {
|
||||
return false
|
||||
}
|
||||
_, present := sec[key]
|
||||
return present
|
||||
}
|
||||
|
||||
defaults := d.Defaults
|
||||
if defaults == nil {
|
||||
def := Defaults()
|
||||
defaults = &def
|
||||
}
|
||||
defRes := defaults.Resolved()
|
||||
|
||||
var findings []Finding
|
||||
|
||||
// Non-pointer string fields. Project's value is in the
|
||||
// source AND is the empty string AND global's value is a
|
||||
// user-set non-default non-empty string → shadowing. (If
|
||||
// the project key is absent, the field inherits — no
|
||||
// shadowing. If global is also empty, both inherit the
|
||||
// default — no shadowing.)
|
||||
type stringField struct {
|
||||
key, projectVal, globalVal string
|
||||
}
|
||||
stringFields := []stringField{
|
||||
{"router.prefer", projectCfg.Router.Prefer, globalCfg.Router.Prefer},
|
||||
{"permission.mode", projectCfg.Permission.Mode, globalCfg.Permission.Mode},
|
||||
{"provider.default", projectCfg.Provider.Default, globalCfg.Provider.Default},
|
||||
{"provider.model", projectCfg.Provider.Model, globalCfg.Provider.Model},
|
||||
}
|
||||
for _, f := range stringFields {
|
||||
// Parse the key to section/field. The format is
|
||||
// "section.field" — split on the first dot.
|
||||
section, field, _ := strings.Cut(f.key, ".")
|
||||
if !hasKey(section, field) {
|
||||
continue
|
||||
}
|
||||
if f.projectVal != "" {
|
||||
continue
|
||||
}
|
||||
if f.globalVal == "" || f.globalVal == defaultStringFor(f.key) {
|
||||
continue
|
||||
}
|
||||
findings = append(findings, Finding{
|
||||
Severity: SeverityWarn,
|
||||
Path: projectPath,
|
||||
Key: f.key,
|
||||
Message: fmt.Sprintf(
|
||||
"project's %s=%q shadows global's %s=%q; the merged value is %q, not the user's global intent",
|
||||
f.key, f.projectVal, f.key, f.globalVal, f.projectVal),
|
||||
Suggestion: "delete the line in the project config to inherit the global value, or set an explicit non-empty value",
|
||||
})
|
||||
}
|
||||
|
||||
// Pointer-converted numeric fields. Project has *0
|
||||
// (explicit zero) when global has a non-default value
|
||||
// → shadowing. (The "is zero" check is on the raw pointer,
|
||||
// not the resolved value, because nil and *0 are different:
|
||||
// nil means "absent" — inherit global — and *0 means
|
||||
// "explicit zero" — override global. The latter is the
|
||||
// bug case.)
|
||||
if projectCfg.Provider.MaxTokens != nil && *projectCfg.Provider.MaxTokens == 0 &&
|
||||
globalCfg.Provider.MaxTokens != nil && *globalCfg.Provider.MaxTokens != defRes.Provider.MaxTokens {
|
||||
findings = append(findings, Finding{
|
||||
Severity: SeverityWarn,
|
||||
Path: projectPath,
|
||||
Key: "provider.max_tokens",
|
||||
Message: fmt.Sprintf(
|
||||
"project's provider.max_tokens=0 shadows global's provider.max_tokens=%d",
|
||||
*globalCfg.Provider.MaxTokens),
|
||||
Suggestion: "delete the line to inherit the global value, or set an explicit non-zero value",
|
||||
})
|
||||
}
|
||||
|
||||
return findings
|
||||
}
|
||||
|
||||
// defaultStringFor returns the documented default value for a
|
||||
// given non-pointer string config key. Used by the layering
|
||||
// check to distinguish "global is at the default" (no
|
||||
// shadowing, nothing to do) from "global has a user-set
|
||||
// value" (which the project might shadow).
|
||||
func defaultStringFor(key string) string {
|
||||
switch key {
|
||||
case "router.prefer":
|
||||
return "" // prefer defaults to "auto" but resolves to ""
|
||||
case "permission.mode":
|
||||
return "auto"
|
||||
case "provider.default":
|
||||
return ""
|
||||
case "provider.model":
|
||||
return ""
|
||||
}
|
||||
return ""
|
||||
}
|
||||
|
||||
// detectUnknownKeys surfaces top-level keys in the source that
|
||||
// don't map to any Config field. Decoder ignores them silently
|
||||
// today; doctor flags them so the user can clean up typos
|
||||
// like `[provdier]` or removed-schema leftovers.
|
||||
func (d *Doctor) detectUnknownKeys(path string, meta toml.MetaData) []Finding {
|
||||
var findings []Finding
|
||||
for _, k := range meta.Undecoded() {
|
||||
findings = append(findings, Finding{
|
||||
Severity: SeverityWarn,
|
||||
Path: path,
|
||||
Key: k.String(),
|
||||
Message: fmt.Sprintf("unknown top-level key %q (not in the current Config schema)", k.String()),
|
||||
Suggestion: "remove the section or rename to a known key",
|
||||
})
|
||||
}
|
||||
return findings
|
||||
}
|
||||
|
||||
// detectInvalidEnums checks enum-typed string fields against
|
||||
// their parsers. The current set is intentionally small —
|
||||
// only fields with a documented value space and a parser
|
||||
// function. Add more as the surface grows.
|
||||
func (d *Doctor) detectInvalidEnums(path string, cfg *Config) []Finding {
|
||||
var findings []Finding
|
||||
|
||||
// permission.mode — must be a permission.Mode constant.
|
||||
if cfg.Permission.Mode != "" && !validPermissionMode(cfg.Permission.Mode) {
|
||||
findings = append(findings, Finding{
|
||||
Severity: SeverityWarn,
|
||||
Path: path,
|
||||
Key: "permission.mode",
|
||||
Message: fmt.Sprintf("invalid permission.mode %q (expected one of: default, accept_edits, bypass, deny, plan, auto)", cfg.Permission.Mode),
|
||||
Suggestion: "fix the value, or remove the line to use the default",
|
||||
})
|
||||
}
|
||||
|
||||
// router.prefer — must parse via router.ParsePreferPolicy.
|
||||
// (That parser accepts "" and "auto" as valid, so we skip
|
||||
// the check on those.)
|
||||
if cfg.Router.Prefer != "" && cfg.Router.Prefer != "auto" &&
|
||||
!validRouterPrefer(cfg.Router.Prefer) {
|
||||
findings = append(findings, Finding{
|
||||
Severity: SeverityWarn,
|
||||
Path: path,
|
||||
Key: "router.prefer",
|
||||
Message: fmt.Sprintf("invalid router.prefer %q (expected \"local\", \"cloud\", or \"auto\")", cfg.Router.Prefer),
|
||||
Suggestion: "fix the value, or remove the line to use the default",
|
||||
})
|
||||
}
|
||||
|
||||
// slm.backend — must be a recognized backend.
|
||||
if cfg.SLM.Backend != "" && !validSLMBackend(cfg.SLM.Backend) {
|
||||
findings = append(findings, Finding{
|
||||
Severity: SeverityWarn,
|
||||
Path: path,
|
||||
Key: "slm.backend",
|
||||
Message: fmt.Sprintf("invalid slm.backend %q (expected auto, ollama, llamacpp, llamafile, openaicompat, or disabled)", cfg.SLM.Backend),
|
||||
Suggestion: "fix the value, or remove the line to use the default",
|
||||
})
|
||||
}
|
||||
|
||||
return findings
|
||||
}
|
||||
|
||||
// detectExplicitZeros surfaces pointer-converted fields whose
|
||||
// value is *zero (the user explicitly wrote a zero in the
|
||||
// file) and the default's resolved value is non-zero. These
|
||||
// are the cases where the user might have a typo (e.g.
|
||||
// `max_tokens = 0` when they meant 8192) or an explicit
|
||||
// override. The upgrade-config preserves them as user
|
||||
// intent; the doctor surfaces them for review.
|
||||
func (d *Doctor) detectExplicitZeros(path string, cfg *Config, defaults *Config) []Finding {
|
||||
var findings []Finding
|
||||
|
||||
resolved := cfg.Resolved()
|
||||
defaultsResolved := defaults.Resolved()
|
||||
|
||||
// Provider.MaxTokens
|
||||
if cfg.Provider.MaxTokens != nil && *cfg.Provider.MaxTokens == 0 && resolved.Provider.MaxTokens != defaultsResolved.Provider.MaxTokens {
|
||||
findings = append(findings, Finding{
|
||||
Severity: SeverityWarn,
|
||||
Path: path,
|
||||
Key: "provider.max_tokens",
|
||||
Message: fmt.Sprintf("explicit zero for provider.max_tokens (resolved to %d); the default is %d. Is this intentional?", resolved.Provider.MaxTokens, defaultsResolved.Provider.MaxTokens),
|
||||
})
|
||||
}
|
||||
|
||||
// Tools.MaxFileSize
|
||||
if cfg.Tools.MaxFileSize != nil && *cfg.Tools.MaxFileSize == 0 && resolved.Tools.MaxFileSize != defaultsResolved.Tools.MaxFileSize {
|
||||
findings = append(findings, Finding{
|
||||
Severity: SeverityWarn,
|
||||
Path: path,
|
||||
Key: "tools.max_file_size",
|
||||
Message: fmt.Sprintf("explicit zero for tools.max_file_size (resolved to %d); the default is %d. Zero disables the size cap.", resolved.Tools.MaxFileSize, defaultsResolved.Tools.MaxFileSize),
|
||||
})
|
||||
}
|
||||
|
||||
// Session.MaxKeep
|
||||
if cfg.Session.MaxKeep != nil && *cfg.Session.MaxKeep == 0 && resolved.Session.MaxKeep != defaultsResolved.Session.MaxKeep {
|
||||
findings = append(findings, Finding{
|
||||
Severity: SeverityWarn,
|
||||
Path: path,
|
||||
Key: "session.max_keep",
|
||||
Message: fmt.Sprintf("explicit zero for session.max_keep (resolved to %d); the default is %d. Zero disables session retention.", resolved.Session.MaxKeep, defaultsResolved.Session.MaxKeep),
|
||||
})
|
||||
}
|
||||
|
||||
return findings
|
||||
}
|
||||
|
||||
// validPermissionMode returns true if s is a recognized
|
||||
// permission mode string. Kept as a local function instead of
|
||||
// importing permission.Mode.Valid() so doctor stays
|
||||
// independent of the permission package's Type system
|
||||
// (permission.Mode is a typed string with .Valid() but using
|
||||
// it would create a coupling we'd rather avoid here).
|
||||
func validPermissionMode(s string) bool {
|
||||
switch s {
|
||||
case "default", "accept_edits", "bypass", "deny", "plan", "auto":
|
||||
return true
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
// validRouterPrefer returns true if s is a recognized router
|
||||
// preference. Mirrors the policy table in router.ParsePreferPolicy
|
||||
// without importing that package (the parser lives in
|
||||
// internal/router; doctor is in internal/config and the
|
||||
// layering would invite import cycles if a future router
|
||||
// subpackage ever imports config).
|
||||
func validRouterPrefer(s string) bool {
|
||||
switch s {
|
||||
case "auto", "local", "cloud":
|
||||
return true
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
// validSLMBackend returns true if s is a recognized SLM
|
||||
// backend name. Mirrors the constants in internal/slm
|
||||
// (auto / ollama / llamacpp / llamafile / openaicompat /
|
||||
// disabled) without importing that package.
|
||||
func validSLMBackend(s string) bool {
|
||||
switch s {
|
||||
case "auto", "ollama", "llamacpp", "llamafile", "openaicompat", "disabled":
|
||||
return true
|
||||
}
|
||||
return false
|
||||
}
|
||||
@@ -0,0 +1,409 @@
|
||||
package config
|
||||
|
||||
import (
|
||||
"os"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
"testing"
|
||||
)
|
||||
|
||||
// TestDiagnose_ValidFileNoFindings sanity-checks the no-op path:
|
||||
// a freshly-written config (after upgrade-config) produces zero
|
||||
// findings because every field either matches the default or
|
||||
// is a legitimate user value.
|
||||
func TestDiagnose_ValidFileNoFindings(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "config.toml")
|
||||
if err := os.WriteFile(path, []byte("[provider]\ndefault = \"anthropic\"\n"), 0o644); err != nil {
|
||||
t.Fatalf("seed: %v", err)
|
||||
}
|
||||
|
||||
doc := NewDoctor()
|
||||
fs := doc.DiagnoseFile(path)
|
||||
for _, f := range fs {
|
||||
if f.Severity >= SeverityWarn {
|
||||
t.Errorf("unexpected warn/error finding for valid file: %+v", f)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// TestDiagnose_MissingFileReturnsErrorFinding verifies the
|
||||
// error path: a path that doesn't exist produces a single
|
||||
// SeverityError finding.
|
||||
func TestDiagnose_MissingFileReturnsErrorFinding(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "nonexistent.toml")
|
||||
|
||||
doc := NewDoctor()
|
||||
fs := doc.DiagnoseFile(path)
|
||||
if len(fs) != 1 {
|
||||
t.Fatalf("len(findings) = %d, want 1", len(fs))
|
||||
}
|
||||
if fs[0].Severity != SeverityError {
|
||||
t.Errorf("Severity = %v, want SeverityError", fs[0].Severity)
|
||||
}
|
||||
if !strings.Contains(fs[0].Message, "read:") {
|
||||
t.Errorf("Message = %q, want it to mention the read error", fs[0].Message)
|
||||
}
|
||||
}
|
||||
|
||||
// TestDiagnose_CorruptFileReturnsErrorFinding verifies the
|
||||
// parse-error path: a file with invalid TOML produces a
|
||||
// SeverityError finding with a parse message.
|
||||
func TestDiagnose_CorruptFileReturnsErrorFinding(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "config.toml")
|
||||
if err := os.WriteFile(path, []byte("[broken\nthis = 'is not valid"), 0o644); err != nil {
|
||||
t.Fatalf("seed: %v", err)
|
||||
}
|
||||
|
||||
doc := NewDoctor()
|
||||
fs := doc.DiagnoseFile(path)
|
||||
if len(fs) != 1 {
|
||||
t.Fatalf("len(findings) = %d, want 1", len(fs))
|
||||
}
|
||||
if fs[0].Severity != SeverityError {
|
||||
t.Errorf("Severity = %v, want SeverityError", fs[0].Severity)
|
||||
}
|
||||
if !strings.Contains(fs[0].Message, "parse:") {
|
||||
t.Errorf("Message = %q, want it to mention the parse error", fs[0].Message)
|
||||
}
|
||||
}
|
||||
|
||||
// TestDiagnose_UnknownTopLevelKeysAreWarned verifies that keys
|
||||
// in the source file that don't map to any Config field
|
||||
// surface as SeverityWarn findings. Decoder ignores them
|
||||
// silently today; doctor surfaces them.
|
||||
func TestDiagnose_UnknownTopLevelKeysAreWarned(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "config.toml")
|
||||
if err := os.WriteFile(path, []byte("[unknown_section]\nfoo = 1\n"), 0o644); err != nil {
|
||||
t.Fatalf("seed: %v", err)
|
||||
}
|
||||
|
||||
doc := NewDoctor()
|
||||
fs := doc.DiagnoseFile(path)
|
||||
found := false
|
||||
for _, f := range fs {
|
||||
if f.Severity == SeverityWarn && strings.Contains(f.Key, "unknown_section") {
|
||||
found = true
|
||||
break
|
||||
}
|
||||
}
|
||||
if !found {
|
||||
t.Errorf("expected warning for unknown_section, got %+v", fs)
|
||||
}
|
||||
}
|
||||
|
||||
// TestDiagnose_InvalidPermissionModeIsWarned verifies that an
|
||||
// invalid permission.mode value surfaces as SeverityWarn.
|
||||
// The mode is a string that must be one of the documented
|
||||
// permission.Mode constants.
|
||||
func TestDiagnose_InvalidPermissionModeIsWarned(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "config.toml")
|
||||
if err := os.WriteFile(path, []byte("[permission]\nmode = \"yes\"\n"), 0o644); err != nil {
|
||||
t.Fatalf("seed: %v", err)
|
||||
}
|
||||
|
||||
doc := NewDoctor()
|
||||
fs := doc.DiagnoseFile(path)
|
||||
found := false
|
||||
for _, f := range fs {
|
||||
if f.Severity == SeverityWarn && f.Key == "permission.mode" {
|
||||
found = true
|
||||
if !strings.Contains(f.Message, "yes") {
|
||||
t.Errorf("Message = %q, want it to mention the invalid value 'yes'", f.Message)
|
||||
}
|
||||
}
|
||||
}
|
||||
if !found {
|
||||
t.Errorf("expected warning for invalid permission.mode, got %+v", fs)
|
||||
}
|
||||
}
|
||||
|
||||
// TestDiagnose_ValidPermissionModeIsClean verifies the
|
||||
// "explicit-valid" path: a user-set valid mode produces no
|
||||
// finding for permission.mode.
|
||||
func TestDiagnose_ValidPermissionModeIsClean(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "config.toml")
|
||||
if err := os.WriteFile(path, []byte("[permission]\nmode = \"deny\"\n"), 0o644); err != nil {
|
||||
t.Fatalf("seed: %v", err)
|
||||
}
|
||||
|
||||
doc := NewDoctor()
|
||||
fs := doc.DiagnoseFile(path)
|
||||
for _, f := range fs {
|
||||
if f.Key == "permission.mode" {
|
||||
t.Errorf("unexpected finding for valid mode: %+v", f)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// TestDiagnose_InvalidRouterPreferIsWarned verifies that an
|
||||
// invalid router.prefer value surfaces as SeverityWarn.
|
||||
func TestDiagnose_InvalidRouterPreferIsWarned(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "config.toml")
|
||||
if err := os.WriteFile(path, []byte("[router]\nprefer = \"yes\"\n"), 0o644); err != nil {
|
||||
t.Fatalf("seed: %v", err)
|
||||
}
|
||||
|
||||
doc := NewDoctor()
|
||||
fs := doc.DiagnoseFile(path)
|
||||
found := false
|
||||
for _, f := range fs {
|
||||
if f.Severity == SeverityWarn && f.Key == "router.prefer" {
|
||||
found = true
|
||||
}
|
||||
}
|
||||
if !found {
|
||||
t.Errorf("expected warning for invalid router.prefer, got %+v", fs)
|
||||
}
|
||||
}
|
||||
|
||||
// TestDiagnose_ExplicitZeroProviderMaxTokensIsWarned verifies
|
||||
// the "explicit zero" case the upgrade-config preserves but
|
||||
// the doctor surfaces: a user-set *int64(0) on a pointer
|
||||
// field whose default is non-zero is probably a mistake.
|
||||
// SeverityWarn (not Error) because the user might have set
|
||||
// it intentionally.
|
||||
func TestDiagnose_ExplicitZeroProviderMaxTokensIsWarned(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "config.toml")
|
||||
if err := os.WriteFile(path, []byte("[provider]\nmax_tokens = 0\n"), 0o644); err != nil {
|
||||
t.Fatalf("seed: %v", err)
|
||||
}
|
||||
|
||||
doc := NewDoctor()
|
||||
fs := doc.DiagnoseFile(path)
|
||||
found := false
|
||||
for _, f := range fs {
|
||||
if f.Severity == SeverityWarn && f.Key == "provider.max_tokens" {
|
||||
found = true
|
||||
}
|
||||
}
|
||||
if !found {
|
||||
t.Errorf("expected warning for explicit-zero max_tokens, got %+v", fs)
|
||||
}
|
||||
}
|
||||
|
||||
// TestDiagnose_DefaultProviderMaxTokensClean documents the
|
||||
// "user set to default" case: the cleaner drops these, and
|
||||
// the doctor should NOT warn about them (the user did the
|
||||
// right thing by setting an explicit value that matches the
|
||||
// default).
|
||||
func TestDiagnose_DefaultProviderMaxTokensClean(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "config.toml")
|
||||
if err := os.WriteFile(path, []byte("[provider]\nmax_tokens = 8192\n"), 0o644); err != nil {
|
||||
t.Fatalf("seed: %v", err)
|
||||
}
|
||||
|
||||
doc := NewDoctor()
|
||||
fs := doc.DiagnoseFile(path)
|
||||
for _, f := range fs {
|
||||
if f.Key == "provider.max_tokens" {
|
||||
t.Errorf("unexpected finding for default-equivalent max_tokens: %+v", f)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// TestDiagnose_DiagnoseManyAggregates verifies the multi-file
|
||||
// API: paths is a list of files to scan, the result is the
|
||||
// concatenation of per-file findings.
|
||||
func TestDiagnose_DiagnoseManyAggregates(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
good := filepath.Join(dir, "good.toml")
|
||||
bad := filepath.Join(dir, "bad.toml")
|
||||
_ = os.WriteFile(good, []byte("[provider]\ndefault = \"anthropic\"\n"), 0o644)
|
||||
_ = os.WriteFile(bad, []byte("[permission]\nmode = \"yes\"\n"), 0o644)
|
||||
|
||||
doc := NewDoctor()
|
||||
fs := doc.DiagnoseFiles([]string{good, bad})
|
||||
if len(fs) < 1 {
|
||||
t.Fatalf("len(findings) = %d, want >= 1", len(fs))
|
||||
}
|
||||
// The bad file should contribute at least one finding.
|
||||
foundBad := false
|
||||
for _, f := range fs {
|
||||
if f.Path == bad {
|
||||
foundBad = true
|
||||
}
|
||||
}
|
||||
if !foundBad {
|
||||
t.Errorf("expected finding for %s, got %+v", bad, fs)
|
||||
}
|
||||
}
|
||||
|
||||
// TestSeverity_String verifies the human-readable form of
|
||||
// Severity values for the CLI's text output.
|
||||
func TestSeverity_String(t *testing.T) {
|
||||
cases := []struct {
|
||||
sev Severity
|
||||
want string
|
||||
}{
|
||||
{SeverityInfo, "info"},
|
||||
{SeverityWarn, "warn"},
|
||||
{SeverityError, "error"},
|
||||
}
|
||||
for _, c := range cases {
|
||||
if got := c.sev.String(); got != c.want {
|
||||
t.Errorf("Severity(%d).String() = %q, want %q", c.sev, got, c.want)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// TestDiagnoseLayering_ProjectShadowsGlobal_PreferEmpty verifies
|
||||
// the original 2026-05-24 silent-corruption bug: the project
|
||||
// file has `router.prefer = ""` which shadows the global's
|
||||
// `router.prefer = "cloud"`. Doctor must surface this.
|
||||
func TestDiagnoseLayering_ProjectShadowsGlobal_PreferEmpty(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
global := filepath.Join(dir, "global.toml")
|
||||
project := filepath.Join(dir, "project.toml")
|
||||
|
||||
_ = os.WriteFile(global, []byte("[router]\nprefer = \"cloud\"\n"), 0o644)
|
||||
_ = os.WriteFile(project, []byte("[router]\nprefer = \"\"\n"), 0o644)
|
||||
|
||||
doc := NewDoctor()
|
||||
fs := doc.DiagnoseLayering(global, project)
|
||||
found := false
|
||||
for _, f := range fs {
|
||||
if f.Key == "router.prefer" && f.Severity == SeverityWarn {
|
||||
found = true
|
||||
if !strings.Contains(f.Message, "shadow") {
|
||||
t.Errorf("Message = %q, want it to mention shadowing", f.Message)
|
||||
}
|
||||
}
|
||||
}
|
||||
if !found {
|
||||
t.Errorf("expected shadowing warning for router.prefer, got %+v", fs)
|
||||
}
|
||||
}
|
||||
|
||||
// TestDiagnoseLayering_NoShadowWhenValuesMatch verifies the
|
||||
// happy path: when the project's resolved value matches the
|
||||
// global's, no shadowing finding is emitted.
|
||||
func TestDiagnoseLayering_NoShadowWhenValuesMatch(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
global := filepath.Join(dir, "global.toml")
|
||||
project := filepath.Join(dir, "project.toml")
|
||||
|
||||
_ = os.WriteFile(global, []byte("[router]\nprefer = \"cloud\"\n"), 0o644)
|
||||
_ = os.WriteFile(project, []byte("[router]\nprefer = \"local\"\n"), 0o644)
|
||||
|
||||
doc := NewDoctor()
|
||||
fs := doc.DiagnoseLayering(global, project)
|
||||
for _, f := range fs {
|
||||
if f.Key == "router.prefer" {
|
||||
t.Errorf("unexpected finding when project overrides global intentionally: %+v", f)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// TestDiagnoseLayering_NoShadowWhenProjectInheritsDefault
|
||||
// documents the inheritance path: when the project's field
|
||||
// is absent (resolves to the default), it inherits the
|
||||
// global's value (or the default if global is also default).
|
||||
// Neither case is shadowing.
|
||||
func TestDiagnoseLayering_NoShadowWhenProjectInheritsDefault(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
global := filepath.Join(dir, "global.toml")
|
||||
project := filepath.Join(dir, "project.toml")
|
||||
|
||||
// Global has a non-default value, project has no router
|
||||
// section at all. The project inherits the global's "cloud"
|
||||
// — no shadowing.
|
||||
_ = os.WriteFile(global, []byte("[router]\nprefer = \"cloud\"\n"), 0o644)
|
||||
_ = os.WriteFile(project, []byte("[provider]\ndefault = \"anthropic\"\n"), 0o644)
|
||||
|
||||
doc := NewDoctor()
|
||||
fs := doc.DiagnoseLayering(global, project)
|
||||
for _, f := range fs {
|
||||
if f.Key == "router.prefer" {
|
||||
t.Errorf("unexpected shadowing finding when project has no [router] section: %+v", f)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// TestDiagnoseLayering_ProjectShadowsGlobal_PermissionMode
|
||||
// verifies another common shadowing case: project has
|
||||
// `permission.mode = ""` while global has `permission.mode =
|
||||
// "deny"`. The merged value is "" (default "auto"), silently
|
||||
// overriding the user's intent.
|
||||
func TestDiagnoseLayering_ProjectShadowsGlobal_PermissionMode(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
global := filepath.Join(dir, "global.toml")
|
||||
project := filepath.Join(dir, "project.toml")
|
||||
|
||||
_ = os.WriteFile(global, []byte("[permission]\nmode = \"deny\"\n"), 0o644)
|
||||
_ = os.WriteFile(project, []byte("[permission]\nmode = \"\"\n"), 0o644)
|
||||
|
||||
doc := NewDoctor()
|
||||
fs := doc.DiagnoseLayering(global, project)
|
||||
found := false
|
||||
for _, f := range fs {
|
||||
if f.Key == "permission.mode" && f.Severity == SeverityWarn {
|
||||
found = true
|
||||
}
|
||||
}
|
||||
if !found {
|
||||
t.Errorf("expected shadowing warning for permission.mode, got %+v", fs)
|
||||
}
|
||||
}
|
||||
|
||||
// TestDiagnoseLayering_ProjectShadowsGlobal_ProviderDefault
|
||||
// documents the provider.default shadowing case: project has
|
||||
// empty default, global has a real one. The user's "openai"
|
||||
// at the global level is silently overridden.
|
||||
func TestDiagnoseLayering_ProjectShadowsGlobal_ProviderDefault(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
global := filepath.Join(dir, "global.toml")
|
||||
project := filepath.Join(dir, "project.toml")
|
||||
|
||||
_ = os.WriteFile(global, []byte("[provider]\ndefault = \"anthropic\"\n"), 0o644)
|
||||
_ = os.WriteFile(project, []byte("[provider]\ndefault = \"\"\n"), 0o644)
|
||||
|
||||
doc := NewDoctor()
|
||||
fs := doc.DiagnoseLayering(global, project)
|
||||
found := false
|
||||
for _, f := range fs {
|
||||
if f.Key == "provider.default" && f.Severity == SeverityWarn {
|
||||
found = true
|
||||
}
|
||||
}
|
||||
if !found {
|
||||
t.Errorf("expected shadowing warning for provider.default, got %+v", fs)
|
||||
}
|
||||
}
|
||||
|
||||
// TestDiagnoseLayering_MissingGlobalIsNoOp documents the
|
||||
// "no global config" case: doctor cannot run a layering
|
||||
// check without a global baseline, so it returns no findings.
|
||||
func TestDiagnoseLayering_MissingGlobalIsNoOp(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
project := filepath.Join(dir, "project.toml")
|
||||
_ = os.WriteFile(project, []byte("[router]\nprefer = \"\"\n"), 0o644)
|
||||
|
||||
doc := NewDoctor()
|
||||
fs := doc.DiagnoseLayering(filepath.Join(dir, "nonexistent-global.toml"), project)
|
||||
if len(fs) != 0 {
|
||||
t.Errorf("expected no findings when global is missing, got %+v", fs)
|
||||
}
|
||||
}
|
||||
|
||||
// TestDiagnoseLayering_MissingProjectIsNoOp mirrors the above:
|
||||
// without a project file there's nothing to shadow.
|
||||
func TestDiagnoseLayering_MissingProjectIsNoOp(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
global := filepath.Join(dir, "global.toml")
|
||||
_ = os.WriteFile(global, []byte("[router]\nprefer = \"cloud\"\n"), 0o644)
|
||||
|
||||
doc := NewDoctor()
|
||||
fs := doc.DiagnoseLayering(global, filepath.Join(dir, "nonexistent-project.toml"))
|
||||
if len(fs) != 0 {
|
||||
t.Errorf("expected no findings when project is missing, got %+v", fs)
|
||||
}
|
||||
}
|
||||
@@ -92,9 +92,26 @@ func ProjectRoot() string {
|
||||
}
|
||||
|
||||
func projectConfigPath() string {
|
||||
return ProjectConfigPath()
|
||||
}
|
||||
|
||||
// ProjectConfigPath returns the path to the project config file
|
||||
// for the current working directory (.gnoma/config.toml under
|
||||
// the project root). Exported so the `gnoma upgrade-config` CLI
|
||||
// (and any future callers that need to point at the project
|
||||
// config) can use it.
|
||||
func ProjectConfigPath() string {
|
||||
return filepath.Join(ProjectRoot(), ".gnoma", "config.toml")
|
||||
}
|
||||
|
||||
// ProjectConfigPathFor returns the project config path for an
|
||||
// arbitrary project root. Used by `gnoma doctor --all-projects`
|
||||
// to enumerate registry entries without `chdir`-ing into each
|
||||
// project.
|
||||
func ProjectConfigPathFor(projectRoot string) string {
|
||||
return filepath.Join(projectRoot, ".gnoma", "config.toml")
|
||||
}
|
||||
|
||||
func applyEnv(cfg *Config) {
|
||||
envKeys := map[string]string{
|
||||
"mistral": "MISTRAL_API_KEY",
|
||||
|
||||
@@ -218,8 +218,8 @@ claude = "claude-work"
|
||||
if cfg.Provider.Model != "claude-base" {
|
||||
t.Errorf("Model = %q, want claude-base (base preserved)", cfg.Provider.Model)
|
||||
}
|
||||
if cfg.Provider.MaxTokens != 4096 {
|
||||
t.Errorf("MaxTokens = %d, want 4096 (base preserved)", cfg.Provider.MaxTokens)
|
||||
if cfg.Provider.MaxTokens == nil || *cfg.Provider.MaxTokens != 4096 {
|
||||
t.Errorf("MaxTokens = %v, want *4096 (base preserved)", cfg.Provider.MaxTokens)
|
||||
}
|
||||
// Map per-key merge.
|
||||
if cfg.Provider.APIKeys["anthropic"] != "BASE_A" {
|
||||
|
||||
@@ -0,0 +1,152 @@
|
||||
package config
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"errors"
|
||||
"fmt"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"sort"
|
||||
"sync"
|
||||
"time"
|
||||
)
|
||||
|
||||
// ProjectEntry is one row in the project registry. The registry
|
||||
// is purely local — written to ~/.config/gnoma/projects.json and
|
||||
// never sent off-machine. The shape is stable for the v0.4.x
|
||||
// series; the schema-version key is reserved for future
|
||||
// migrations.
|
||||
type ProjectEntry struct {
|
||||
Path string `json:"path"`
|
||||
FirstSeen time.Time `json:"first_seen"`
|
||||
LastSeen time.Time `json:"last_seen"`
|
||||
SessionCount int `json:"session_count"`
|
||||
}
|
||||
|
||||
// Registry is the on-disk list of projects gnoma has been
|
||||
// launched in. Used by:
|
||||
// - `gnoma doctor --all-projects` (Phase 3)
|
||||
// - `gnoma upgrade-config --all` (Phase 4 --all-projects)
|
||||
// - `gnoma sessions --all` picker (cross-project resume)
|
||||
// - `gnoma stats` (local-only aggregate metrics)
|
||||
//
|
||||
// Loaded once at startup, mutated in-process, saved atomically.
|
||||
// The struct is safe for concurrent Record/Prune calls (each
|
||||
// call locks the mutex), but in the typical flow only one
|
||||
// goroutine (main) writes to it.
|
||||
type Registry struct {
|
||||
path string `json:"-"` // unexported, not serialized
|
||||
|
||||
mu sync.Mutex
|
||||
Projects []ProjectEntry `json:"projects"`
|
||||
}
|
||||
|
||||
// RegistryFilePath returns the canonical path to the registry
|
||||
// file (~/.config/gnoma/projects.json). Exported so callers
|
||||
// (and tests) can inspect / delete the file.
|
||||
func RegistryFilePath() string {
|
||||
return filepath.Join(GlobalConfigDir(), "projects.json")
|
||||
}
|
||||
|
||||
// LoadRegistry reads the registry from the canonical path
|
||||
// (~/.config/gnoma/projects.json). A missing file is not an
|
||||
// error: returns an empty Registry. A corrupt file is an error
|
||||
// — silent zero-ing on corruption would let a broken file
|
||||
// accumulate stale state indefinitely.
|
||||
func LoadRegistry() (*Registry, error) {
|
||||
return LoadRegistryAt(RegistryFilePath())
|
||||
}
|
||||
|
||||
// LoadRegistryAt is the testable variant: load the registry
|
||||
// from an explicit path instead of the canonical one. Used by
|
||||
// the test suite to keep `~/.config/gnoma/projects.json`
|
||||
// untouched.
|
||||
func LoadRegistryAt(path string) (*Registry, error) {
|
||||
r := &Registry{path: path}
|
||||
data, err := os.ReadFile(path)
|
||||
if err != nil {
|
||||
if os.IsNotExist(err) {
|
||||
return r, nil
|
||||
}
|
||||
return nil, fmt.Errorf("read registry: %w", err)
|
||||
}
|
||||
if err := json.Unmarshal(data, r); err != nil {
|
||||
return nil, fmt.Errorf("parse registry: %w", err)
|
||||
}
|
||||
return r, nil
|
||||
}
|
||||
|
||||
// Record adds or updates the entry for projectRoot. Bumps
|
||||
// LastSeen and SessionCount for an existing entry; appends a
|
||||
// fresh row for a new path. Saves atomically.
|
||||
//
|
||||
// Empty projectRoot is an error — ProgrammerError to call
|
||||
// with "". Path normalization (e.g. resolving symlinks) is
|
||||
// the caller's responsibility; ProjectRoot() in load.go
|
||||
// already returns an absolute path so the typical caller
|
||||
// doesn't need to think about it.
|
||||
func (r *Registry) Record(projectRoot string) error {
|
||||
if projectRoot == "" {
|
||||
return errors.New("project root is empty")
|
||||
}
|
||||
r.mu.Lock()
|
||||
defer r.mu.Unlock()
|
||||
|
||||
now := time.Now().UTC()
|
||||
for i := range r.Projects {
|
||||
if r.Projects[i].Path == projectRoot {
|
||||
r.Projects[i].LastSeen = now
|
||||
r.Projects[i].SessionCount++
|
||||
return r.saveLocked()
|
||||
}
|
||||
}
|
||||
r.Projects = append(r.Projects, ProjectEntry{
|
||||
Path: projectRoot,
|
||||
FirstSeen: now,
|
||||
LastSeen: now,
|
||||
SessionCount: 1,
|
||||
})
|
||||
return r.saveLocked()
|
||||
}
|
||||
|
||||
// Prune removes entries with LastSeen older than staleBefore.
|
||||
// Returns the (sorted) list of pruned paths so callers can
|
||||
// surface them in user-facing output (e.g. `gnoma doctor`).
|
||||
// No-op when nothing is stale.
|
||||
func (r *Registry) Prune(staleBefore time.Duration) ([]string, error) {
|
||||
r.mu.Lock()
|
||||
defer r.mu.Unlock()
|
||||
|
||||
cutoff := time.Now().UTC().Add(-staleBefore)
|
||||
var pruned []string
|
||||
var kept []ProjectEntry
|
||||
for _, p := range r.Projects {
|
||||
if p.LastSeen.Before(cutoff) {
|
||||
pruned = append(pruned, p.Path)
|
||||
} else {
|
||||
kept = append(kept, p)
|
||||
}
|
||||
}
|
||||
if len(pruned) == 0 {
|
||||
return nil, nil
|
||||
}
|
||||
sort.Strings(pruned)
|
||||
r.Projects = kept
|
||||
if err := r.saveLocked(); err != nil {
|
||||
return pruned, err
|
||||
}
|
||||
return pruned, nil
|
||||
}
|
||||
|
||||
// saveLocked writes the registry to disk atomically. The
|
||||
// caller must hold r.mu.
|
||||
func (r *Registry) saveLocked() error {
|
||||
if err := os.MkdirAll(filepath.Dir(r.path), 0o755); err != nil {
|
||||
return fmt.Errorf("create registry dir: %w", err)
|
||||
}
|
||||
data, err := json.MarshalIndent(r, "", " ")
|
||||
if err != nil {
|
||||
return fmt.Errorf("marshal registry: %w", err)
|
||||
}
|
||||
return writeAtomicBytes(r.path, data)
|
||||
}
|
||||
@@ -0,0 +1,357 @@
|
||||
package config
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"sort"
|
||||
"strings"
|
||||
"testing"
|
||||
"time"
|
||||
)
|
||||
|
||||
// TestRegistry_LoadAt_MissingFileReturnsEmpty verifies the
|
||||
// "no file yet" path: LoadRegistryAt returns a fresh, empty
|
||||
// registry with no error, so first-run users don't see a
|
||||
// "no such file" error.
|
||||
func TestRegistry_LoadAt_MissingFileReturnsEmpty(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "projects.json")
|
||||
|
||||
reg, err := LoadRegistryAt(path)
|
||||
if err != nil {
|
||||
t.Fatalf("LoadRegistryAt: %v", err)
|
||||
}
|
||||
if reg == nil {
|
||||
t.Fatal("LoadRegistryAt returned nil registry")
|
||||
}
|
||||
if len(reg.Projects) != 0 {
|
||||
t.Errorf("len(Projects) = %d, want 0", len(reg.Projects))
|
||||
}
|
||||
}
|
||||
|
||||
// TestRegistry_LoadAt_ValidFileParses verifies the load path
|
||||
// against a known-good file written by a previous save.
|
||||
func TestRegistry_LoadAt_ValidFileParses(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "projects.json")
|
||||
|
||||
seed := Registry{
|
||||
Projects: []ProjectEntry{
|
||||
{
|
||||
Path: "/home/user/git/foo",
|
||||
FirstSeen: time.Date(2026, 4, 15, 10, 30, 0, 0, time.UTC),
|
||||
LastSeen: time.Date(2026, 5, 24, 19, 23, 0, 0, time.UTC),
|
||||
SessionCount: 47,
|
||||
},
|
||||
},
|
||||
}
|
||||
data, _ := json.MarshalIndent(&seed, "", " ")
|
||||
if err := os.WriteFile(path, data, 0o644); err != nil {
|
||||
t.Fatalf("seed: %v", err)
|
||||
}
|
||||
|
||||
reg, err := LoadRegistryAt(path)
|
||||
if err != nil {
|
||||
t.Fatalf("LoadRegistryAt: %v", err)
|
||||
}
|
||||
if len(reg.Projects) != 1 {
|
||||
t.Fatalf("len(Projects) = %d, want 1", len(reg.Projects))
|
||||
}
|
||||
got := reg.Projects[0]
|
||||
if got.Path != "/home/user/git/foo" {
|
||||
t.Errorf("Path = %q, want /home/user/git/foo", got.Path)
|
||||
}
|
||||
if got.SessionCount != 47 {
|
||||
t.Errorf("SessionCount = %d, want 47", got.SessionCount)
|
||||
}
|
||||
}
|
||||
|
||||
// TestRegistry_LoadAt_CorruptFileErrors verifies that a malformed
|
||||
// JSON file produces an error, not a silent zero-valued registry.
|
||||
// Silent zero-ing would let file corruption go unnoticed.
|
||||
func TestRegistry_LoadAt_CorruptFileErrors(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "projects.json")
|
||||
if err := os.WriteFile(path, []byte("{ this is not valid json"), 0o644); err != nil {
|
||||
t.Fatalf("seed: %v", err)
|
||||
}
|
||||
|
||||
_, err := LoadRegistryAt(path)
|
||||
if err == nil {
|
||||
t.Fatal("LoadRegistryAt on corrupt file returned nil error")
|
||||
}
|
||||
}
|
||||
|
||||
// TestRegistry_Record_AddsNewProject verifies the first-record
|
||||
// path: a new path gets a fresh entry with FirstSeen == LastSeen
|
||||
// and SessionCount == 1.
|
||||
func TestRegistry_Record_AddsNewProject(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "projects.json")
|
||||
|
||||
reg, _ := LoadRegistryAt(path)
|
||||
if err := reg.Record("/home/user/git/foo"); err != nil {
|
||||
t.Fatalf("Record: %v", err)
|
||||
}
|
||||
if len(reg.Projects) != 1 {
|
||||
t.Fatalf("len(Projects) = %d, want 1", len(reg.Projects))
|
||||
}
|
||||
p := reg.Projects[0]
|
||||
if p.Path != "/home/user/git/foo" {
|
||||
t.Errorf("Path = %q, want /home/user/git/foo", p.Path)
|
||||
}
|
||||
if !p.FirstSeen.Equal(p.LastSeen) {
|
||||
t.Errorf("FirstSeen=%v != LastSeen=%v (should be equal on first record)", p.FirstSeen, p.LastSeen)
|
||||
}
|
||||
if p.SessionCount != 1 {
|
||||
t.Errorf("SessionCount = %d, want 1", p.SessionCount)
|
||||
}
|
||||
}
|
||||
|
||||
// TestRegistry_Record_BumpsExistingProject verifies the
|
||||
// second-record path: a project that's already in the registry
|
||||
// gets LastSeen updated and SessionCount incremented; FirstSeen
|
||||
// is preserved.
|
||||
func TestRegistry_Record_BumpsExistingProject(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "projects.json")
|
||||
|
||||
reg, _ := LoadRegistryAt(path)
|
||||
if err := reg.Record("/home/user/git/foo"); err != nil {
|
||||
t.Fatalf("first Record: %v", err)
|
||||
}
|
||||
firstSeen := reg.Projects[0].FirstSeen
|
||||
|
||||
// Wait long enough that time.Now() will differ at nanosecond
|
||||
// resolution. time.Time comparison uses nanoseconds; the
|
||||
// millisecond between two Record calls is plenty.
|
||||
time.Sleep(2 * time.Millisecond)
|
||||
if err := reg.Record("/home/user/git/foo"); err != nil {
|
||||
t.Fatalf("second Record: %v", err)
|
||||
}
|
||||
if len(reg.Projects) != 1 {
|
||||
t.Fatalf("len(Projects) = %d, want 1 (no duplicate)", len(reg.Projects))
|
||||
}
|
||||
p := reg.Projects[0]
|
||||
if p.SessionCount != 2 {
|
||||
t.Errorf("SessionCount = %d, want 2", p.SessionCount)
|
||||
}
|
||||
if !p.FirstSeen.Equal(firstSeen) {
|
||||
t.Errorf("FirstSeen changed: %v → %v", firstSeen, p.FirstSeen)
|
||||
}
|
||||
if !p.LastSeen.After(firstSeen) {
|
||||
t.Errorf("LastSeen=%v not after FirstSeen=%v", p.LastSeen, firstSeen)
|
||||
}
|
||||
}
|
||||
|
||||
// TestRegistry_Record_EmptyPathReturnsError verifies the
|
||||
// input-validation path. An empty project root is a programmer
|
||||
// error, not a silent no-op.
|
||||
func TestRegistry_Record_EmptyPathReturnsError(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "projects.json")
|
||||
reg, _ := LoadRegistryAt(path)
|
||||
|
||||
if err := reg.Record(""); err == nil {
|
||||
t.Error("Record(\"\") returned nil error, want error")
|
||||
}
|
||||
}
|
||||
|
||||
// TestRegistry_Record_AtomicWriteLeavesNoTemp verifies the
|
||||
// atomic-write hygiene: after a successful Record, no .tmp-*
|
||||
// file is left in the directory.
|
||||
func TestRegistry_Record_AtomicWriteLeavesNoTemp(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "projects.json")
|
||||
|
||||
reg, _ := LoadRegistryAt(path)
|
||||
if err := reg.Record("/home/user/git/foo"); err != nil {
|
||||
t.Fatalf("Record: %v", err)
|
||||
}
|
||||
|
||||
entries, err := os.ReadDir(dir)
|
||||
if err != nil {
|
||||
t.Fatalf("ReadDir: %v", err)
|
||||
}
|
||||
for _, e := range entries {
|
||||
if e.Name() != "projects.json" {
|
||||
t.Errorf("unexpected leftover file: %q", e.Name())
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// TestRegistry_Record_PersistsAcrossReload verifies the
|
||||
// save/load contract: a Record followed by a fresh Load
|
||||
// returns the updated data.
|
||||
func TestRegistry_Record_PersistsAcrossReload(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "projects.json")
|
||||
|
||||
reg, _ := LoadRegistryAt(path)
|
||||
if err := reg.Record("/home/user/git/foo"); err != nil {
|
||||
t.Fatalf("Record: %v", err)
|
||||
}
|
||||
if err := reg.Record("/home/user/git/bar"); err != nil {
|
||||
t.Fatalf("Record: %v", err)
|
||||
}
|
||||
|
||||
// Fresh load (simulates a new process).
|
||||
reloaded, err := LoadRegistryAt(path)
|
||||
if err != nil {
|
||||
t.Fatalf("re-Load: %v", err)
|
||||
}
|
||||
if len(reloaded.Projects) != 2 {
|
||||
t.Errorf("len(Projects) = %d, want 2", len(reloaded.Projects))
|
||||
}
|
||||
// Order is not guaranteed; check both paths present.
|
||||
paths := []string{reloaded.Projects[0].Path, reloaded.Projects[1].Path}
|
||||
sort.Strings(paths)
|
||||
want := []string{"/home/user/git/bar", "/home/user/git/foo"}
|
||||
for i, p := range want {
|
||||
if paths[i] != p {
|
||||
t.Errorf("paths[%d] = %q, want %q", i, paths[i], p)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// TestRegistry_Save_CreatatesDirectoryIfMissing verifies the
|
||||
// "first save" path: the registry file lives in a directory
|
||||
// that may not exist yet. Save should create the directory
|
||||
// rather than fail.
|
||||
func TestRegistry_Save_CreatatesDirectoryIfMissing(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
deepPath := filepath.Join(dir, "nested", "deeper", "projects.json")
|
||||
|
||||
reg, _ := LoadRegistryAt(deepPath)
|
||||
if err := reg.Record("/home/user/git/foo"); err != nil {
|
||||
t.Fatalf("Record: %v", err)
|
||||
}
|
||||
if _, err := os.Stat(deepPath); err != nil {
|
||||
t.Errorf("expected file at %s, got %v", deepPath, err)
|
||||
}
|
||||
}
|
||||
|
||||
// TestRegistry_Prune_RemovesStaleEntries verifies the core
|
||||
// pruning semantic: entries with LastSeen older than the
|
||||
// cutoff are removed; the rest are kept.
|
||||
func TestRegistry_Prune_RemovesStaleEntries(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "projects.json")
|
||||
|
||||
now := time.Now().UTC()
|
||||
reg := &Registry{path: path, Projects: []ProjectEntry{
|
||||
{Path: "/stale/1", FirstSeen: now.Add(-100 * 24 * time.Hour), LastSeen: now.Add(-90 * 24 * time.Hour), SessionCount: 5},
|
||||
{Path: "/fresh/1", FirstSeen: now.Add(-1 * 24 * time.Hour), LastSeen: now.Add(-1 * time.Hour), SessionCount: 10},
|
||||
{Path: "/stale/2", FirstSeen: now.Add(-200 * 24 * time.Hour), LastSeen: now.Add(-60 * 24 * time.Hour), SessionCount: 1},
|
||||
{Path: "/fresh/2", FirstSeen: now, LastSeen: now, SessionCount: 1},
|
||||
}}
|
||||
|
||||
pruned, err := reg.Prune(30 * 24 * time.Hour) // 30 days
|
||||
if err != nil {
|
||||
t.Fatalf("Prune: %v", err)
|
||||
}
|
||||
if len(pruned) != 2 {
|
||||
t.Errorf("len(pruned) = %d, want 2 (got %v)", len(pruned), pruned)
|
||||
}
|
||||
if len(reg.Projects) != 2 {
|
||||
t.Errorf("len(Projects) = %d, want 2", len(reg.Projects))
|
||||
}
|
||||
for _, p := range reg.Projects {
|
||||
if !strings.HasPrefix(p.Path, "/fresh/") {
|
||||
t.Errorf("stale project %q survived prune", p.Path)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// TestRegistry_Prune_KeepsRecentEntries documents the inverse
|
||||
// case: nothing to prune returns an empty list and no save.
|
||||
func TestRegistry_Prune_KeepsRecentEntries(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "projects.json")
|
||||
|
||||
now := time.Now().UTC()
|
||||
reg := &Registry{path: path, Projects: []ProjectEntry{
|
||||
{Path: "/fresh/1", FirstSeen: now, LastSeen: now, SessionCount: 1},
|
||||
{Path: "/fresh/2", FirstSeen: now, LastSeen: now.Add(-1 * time.Hour), SessionCount: 2},
|
||||
}}
|
||||
|
||||
pruned, err := reg.Prune(30 * 24 * time.Hour)
|
||||
if err != nil {
|
||||
t.Fatalf("Prune: %v", err)
|
||||
}
|
||||
if len(pruned) != 0 {
|
||||
t.Errorf("len(pruned) = %d, want 0 (got %v)", len(pruned), pruned)
|
||||
}
|
||||
if len(reg.Projects) != 2 {
|
||||
t.Errorf("len(Projects) = %d, want 2", len(reg.Projects))
|
||||
}
|
||||
}
|
||||
|
||||
// TestRegistry_Prune_ReportsPrunedPaths verifies the return
|
||||
// value: the pruned paths are returned to the caller for
|
||||
// reporting (e.g. `gnoma doctor` could surface this).
|
||||
func TestRegistry_Prune_ReportsPrunedPaths(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "projects.json")
|
||||
|
||||
now := time.Now().UTC()
|
||||
reg := &Registry{path: path, Projects: []ProjectEntry{
|
||||
{Path: "/z/last-stale", FirstSeen: now.Add(-100 * 24 * time.Hour), LastSeen: now.Add(-90 * 24 * time.Hour)},
|
||||
{Path: "/a/first-stale", FirstSeen: now.Add(-200 * 24 * time.Hour), LastSeen: now.Add(-60 * 24 * time.Hour)},
|
||||
}}
|
||||
|
||||
pruned, _ := reg.Prune(30 * 24 * time.Hour)
|
||||
if len(pruned) != 2 {
|
||||
t.Fatalf("len(pruned) = %d, want 2", len(pruned))
|
||||
}
|
||||
// Sorted for deterministic caller output.
|
||||
if pruned[0] != "/a/first-stale" || pruned[1] != "/z/last-stale" {
|
||||
t.Errorf("pruned = %v, want sorted [/a/first-stale /z/last-stale]", pruned)
|
||||
}
|
||||
}
|
||||
|
||||
// TestRegistry_Prune_EmptyRegistryIsNoOp verifies the
|
||||
// "nothing to prune" edge case on an empty registry.
|
||||
func TestRegistry_Prune_EmptyRegistryIsNoOp(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "projects.json")
|
||||
reg := &Registry{path: path}
|
||||
|
||||
pruned, err := reg.Prune(30 * 24 * time.Hour)
|
||||
if err != nil {
|
||||
t.Fatalf("Prune: %v", err)
|
||||
}
|
||||
if len(pruned) != 0 {
|
||||
t.Errorf("len(pruned) = %d, want 0", len(pruned))
|
||||
}
|
||||
}
|
||||
|
||||
// TestRegistry_Prune_PersistsAcrossReload verifies that the
|
||||
// pruned state is written to disk and visible after a fresh
|
||||
// LoadRegistryAt. The save happens inside Prune; the reload
|
||||
// confirms it.
|
||||
func TestRegistry_Prune_PersistsAcrossReload(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "projects.json")
|
||||
|
||||
now := time.Now().UTC()
|
||||
reg := &Registry{path: path, Projects: []ProjectEntry{
|
||||
{Path: "/stale", FirstSeen: now.Add(-100 * 24 * time.Hour), LastSeen: now.Add(-90 * 24 * time.Hour)},
|
||||
{Path: "/fresh", FirstSeen: now, LastSeen: now},
|
||||
}}
|
||||
if _, err := reg.Prune(30 * 24 * time.Hour); err != nil {
|
||||
t.Fatalf("Prune: %v", err)
|
||||
}
|
||||
|
||||
reloaded, err := LoadRegistryAt(path)
|
||||
if err != nil {
|
||||
t.Fatalf("re-Load: %v", err)
|
||||
}
|
||||
if len(reloaded.Projects) != 1 {
|
||||
t.Errorf("len(Projects) after reload = %d, want 1", len(reloaded.Projects))
|
||||
}
|
||||
if len(reloaded.Projects) == 1 && reloaded.Projects[0].Path != "/fresh" {
|
||||
t.Errorf("reloaded project = %q, want /fresh", reloaded.Projects[0].Path)
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,223 @@
|
||||
package config
|
||||
|
||||
import "time"
|
||||
|
||||
// ResolvedConfig is the post-Load view of a Config: every pointer
|
||||
// field has been dereferenced with the default substituted for nil.
|
||||
// Consumers should read cfg.Resolved().X for the fields listed in
|
||||
// the resolver table; raw cfg.X remains valid for the string / map /
|
||||
// slice fields that kept their non-pointer types and are read at
|
||||
// their call site.
|
||||
//
|
||||
// This mirrors the ResolvedSafetySection pattern: a separate mirror
|
||||
// type whose construction is the boundary where "user omitted the
|
||||
// key" and "user set it to the zero value" stop being ambiguous.
|
||||
//
|
||||
// Fields that are not pointer-converted (string / map / slice /
|
||||
// BanditSection) are intentionally omitted from the mirror — call
|
||||
// sites read them directly from the source Config.
|
||||
type ResolvedConfig struct {
|
||||
// ProjectRegistry mirrors Config.ProjectRegistry. nil →
|
||||
// default (true, registry enabled); *false → registry
|
||||
// disabled. Lives at the top level because it gates a
|
||||
// gnoma-wide behavior (writing to projects.json), not a
|
||||
// section's behavior.
|
||||
ProjectRegistry bool
|
||||
|
||||
Provider ResolvedProviderSection
|
||||
Tools ResolvedToolsSection
|
||||
Security ResolvedSecuritySection
|
||||
Router ResolvedRouterSection
|
||||
Session ResolvedSessionSection
|
||||
SLM ResolvedSLMSection
|
||||
Hooks []ResolvedHook
|
||||
}
|
||||
|
||||
// ResolvedProviderSection is ProviderSection with all pointer
|
||||
// fields dereferenced.
|
||||
type ResolvedProviderSection struct {
|
||||
Default string
|
||||
Model string
|
||||
MaxTokens int64
|
||||
Temperature *float64
|
||||
APIKeys map[string]string
|
||||
Endpoints map[string]string
|
||||
}
|
||||
|
||||
// ResolvedToolsSection is ToolsSection with pointer fields
|
||||
// dereferenced. BashTimeout is left as a time.Duration so the
|
||||
// `Duration == 0` sentinel "use built-in default" can be checked
|
||||
// by consumers that care.
|
||||
type ResolvedToolsSection struct {
|
||||
BashTimeout time.Duration
|
||||
MaxFileSize int64
|
||||
}
|
||||
|
||||
// ResolvedSecuritySection is SecuritySection with pointer fields
|
||||
// dereferenced.
|
||||
type ResolvedSecuritySection struct {
|
||||
EntropyThreshold float64
|
||||
RedactHighEntropy bool
|
||||
EntropySafelist []string
|
||||
Patterns []PatternConfig
|
||||
}
|
||||
|
||||
// ResolvedRouterSection is RouterSection with pointer fields
|
||||
// dereferenced. Bandit is omitted — its 0-sentinel pattern is
|
||||
// documented at the source struct and read directly via
|
||||
// cfg.Router.Bandit.
|
||||
type ResolvedRouterSection struct {
|
||||
ForceTwoStage bool
|
||||
Prefer string
|
||||
}
|
||||
|
||||
// ResolvedSessionSection is SessionSection with pointer fields
|
||||
// dereferenced.
|
||||
type ResolvedSessionSection struct {
|
||||
MaxKeep int
|
||||
}
|
||||
|
||||
// ResolvedSLMSection is SLMSection with pointer-converted fields
|
||||
// dereferenced. Added in the 2026-06-04 follow-up to Phase 1 of
|
||||
// the config-migration plan — see
|
||||
// docs/superpowers/plans/2026-06-04-config-migration-followups.md.
|
||||
// Enabled / RegisterAsArm stay as their Go types (not pointers:
|
||||
// the existing 0-sentinel pattern still applies for Enabled, and
|
||||
// RegisterAsArm was already *bool with its own nil→true handling
|
||||
// at the call sites — see internal/slm/arm.go).
|
||||
type ResolvedSLMSection struct {
|
||||
Enabled bool
|
||||
Backend string
|
||||
Model string
|
||||
BaseURL string
|
||||
ModelURL string
|
||||
DataDir string
|
||||
ExpectedSHA256 string
|
||||
StartupTimeout time.Duration
|
||||
ClassifyTimeout time.Duration
|
||||
RegisterAsArm bool
|
||||
}
|
||||
|
||||
// ResolvedHook is HookConfig with FailOpen dereferenced. All other
|
||||
// fields are pass-through copies.
|
||||
type ResolvedHook struct {
|
||||
Name string
|
||||
Event string
|
||||
Type string
|
||||
Exec string
|
||||
Timeout string
|
||||
FailOpen bool
|
||||
ToolPattern string
|
||||
}
|
||||
|
||||
// Resolved builds a ResolvedConfig from a Config, substituting
|
||||
// Defaults() values for any nil pointer fields. Called once at the
|
||||
// end of LoadWithProfile (and LoadBase) so all consumer code reads
|
||||
// resolved values; raw layered structs are internal.
|
||||
func (c *Config) Resolved() *ResolvedConfig {
|
||||
d := Defaults()
|
||||
|
||||
projectRegistry := true
|
||||
if c.Settings.ProjectRegistry != nil {
|
||||
projectRegistry = *c.Settings.ProjectRegistry
|
||||
}
|
||||
|
||||
provider := ResolvedProviderSection{
|
||||
Default: c.Provider.Default,
|
||||
Model: c.Provider.Model,
|
||||
MaxTokens: *d.Provider.MaxTokens,
|
||||
Temperature: c.Provider.Temperature,
|
||||
APIKeys: c.Provider.APIKeys,
|
||||
Endpoints: c.Provider.Endpoints,
|
||||
}
|
||||
if c.Provider.MaxTokens != nil {
|
||||
provider.MaxTokens = *c.Provider.MaxTokens
|
||||
}
|
||||
|
||||
tools := ResolvedToolsSection{
|
||||
BashTimeout: d.Tools.BashTimeout.Duration(),
|
||||
MaxFileSize: *d.Tools.MaxFileSize,
|
||||
}
|
||||
if c.Tools.BashTimeout != 0 {
|
||||
tools.BashTimeout = c.Tools.BashTimeout.Duration()
|
||||
}
|
||||
if c.Tools.MaxFileSize != nil {
|
||||
tools.MaxFileSize = *c.Tools.MaxFileSize
|
||||
}
|
||||
|
||||
security := ResolvedSecuritySection{
|
||||
EntropyThreshold: *d.Security.EntropyThreshold,
|
||||
RedactHighEntropy: *d.Security.RedactHighEntropy,
|
||||
EntropySafelist: c.Security.EntropySafelist,
|
||||
Patterns: c.Security.Patterns,
|
||||
}
|
||||
if c.Security.EntropyThreshold != nil {
|
||||
security.EntropyThreshold = *c.Security.EntropyThreshold
|
||||
}
|
||||
if c.Security.RedactHighEntropy != nil {
|
||||
security.RedactHighEntropy = *c.Security.RedactHighEntropy
|
||||
}
|
||||
|
||||
router := ResolvedRouterSection{
|
||||
ForceTwoStage: *d.Router.ForceTwoStage,
|
||||
Prefer: c.Router.Prefer,
|
||||
}
|
||||
if c.Router.ForceTwoStage != nil {
|
||||
router.ForceTwoStage = *c.Router.ForceTwoStage
|
||||
}
|
||||
|
||||
session := ResolvedSessionSection{
|
||||
MaxKeep: *d.Session.MaxKeep,
|
||||
}
|
||||
if c.Session.MaxKeep != nil {
|
||||
session.MaxKeep = *c.Session.MaxKeep
|
||||
}
|
||||
|
||||
slm := ResolvedSLMSection{
|
||||
Enabled: c.SLM.Enabled,
|
||||
Backend: c.SLM.Backend,
|
||||
Model: c.SLM.Model,
|
||||
BaseURL: c.SLM.BaseURL,
|
||||
ModelURL: c.SLM.ModelURL,
|
||||
DataDir: c.SLM.DataDir,
|
||||
ExpectedSHA256: c.SLM.ExpectedSHA256,
|
||||
StartupTimeout: d.SLM.StartupTimeout.Duration(),
|
||||
ClassifyTimeout: d.SLM.ClassifyTimeout.Duration(),
|
||||
// RegisterAsArm: nil → default (true), explicit *true → true,
|
||||
// explicit *false → false. The default-true case preserves
|
||||
// pre-config behaviour where the SLM is always registered as
|
||||
// an execution arm in addition to its classifier role.
|
||||
RegisterAsArm: c.SLM.RegisterAsArm == nil || *c.SLM.RegisterAsArm,
|
||||
}
|
||||
if c.SLM.StartupTimeout != nil {
|
||||
slm.StartupTimeout = c.SLM.StartupTimeout.Duration()
|
||||
}
|
||||
if c.SLM.ClassifyTimeout != nil {
|
||||
slm.ClassifyTimeout = c.SLM.ClassifyTimeout.Duration()
|
||||
}
|
||||
|
||||
hooks := make([]ResolvedHook, len(c.Hooks))
|
||||
for i, h := range c.Hooks {
|
||||
failOpen := h.FailOpen != nil && *h.FailOpen
|
||||
hooks[i] = ResolvedHook{
|
||||
Name: h.Name,
|
||||
Event: h.Event,
|
||||
Type: h.Type,
|
||||
Exec: h.Exec,
|
||||
Timeout: h.Timeout,
|
||||
FailOpen: failOpen,
|
||||
ToolPattern: h.ToolPattern,
|
||||
}
|
||||
}
|
||||
|
||||
return &ResolvedConfig{
|
||||
ProjectRegistry: projectRegistry,
|
||||
Provider: provider,
|
||||
Tools: tools,
|
||||
Security: security,
|
||||
Router: router,
|
||||
Session: session,
|
||||
SLM: slm,
|
||||
Hooks: hooks,
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,274 @@
|
||||
package config
|
||||
|
||||
import (
|
||||
"testing"
|
||||
"time"
|
||||
)
|
||||
|
||||
// i64p returns a pointer to its argument. Test helper for
|
||||
// constructing literal `*int64` values without a temporary variable.
|
||||
func i64p(v int64) *int64 { return &v }
|
||||
|
||||
// ip returns a pointer to its argument. Test helper for
|
||||
// constructing literal `*int` values.
|
||||
func ip(v int) *int { return &v }
|
||||
|
||||
// bp returns a pointer to its argument. Test helper for
|
||||
// constructing literal `*bool` values.
|
||||
func bp(v bool) *bool { return &v }
|
||||
|
||||
// fp64 returns a pointer to its argument. Test helper for
|
||||
// constructing literal `*float64` values.
|
||||
func fp64(v float64) *float64 { return &v }
|
||||
|
||||
// TestResolve_SubstitutesDefaultsForNilPointers verifies that pointer
|
||||
// fields left nil after TOML decode (i.e. user didn't set them) get
|
||||
// the default value at resolve time. This is the core of the
|
||||
// zero-spam fix: the file is allowed to omit the field, and the
|
||||
// consumer still sees the default.
|
||||
func TestResolve_SubstitutesDefaultsForNilPointers(t *testing.T) {
|
||||
cfg := &Config{} // zero: every pointer is nil
|
||||
resolved := cfg.Resolved()
|
||||
|
||||
if resolved.Provider.MaxTokens != 8192 {
|
||||
t.Errorf("Resolved.Provider.MaxTokens = %d, want 8192 (default)", resolved.Provider.MaxTokens)
|
||||
}
|
||||
if resolved.Tools.MaxFileSize != 1<<20 {
|
||||
t.Errorf("Resolved.Tools.MaxFileSize = %d, want %d (default)", resolved.Tools.MaxFileSize, 1<<20)
|
||||
}
|
||||
if resolved.Security.EntropyThreshold != 4.5 {
|
||||
t.Errorf("Resolved.Security.EntropyThreshold = %v, want 4.5 (default)", resolved.Security.EntropyThreshold)
|
||||
}
|
||||
if resolved.Security.RedactHighEntropy {
|
||||
t.Errorf("Resolved.Security.RedactHighEntropy = true, want false (default)")
|
||||
}
|
||||
if resolved.Router.ForceTwoStage {
|
||||
t.Errorf("Resolved.Router.ForceTwoStage = true, want false (default)")
|
||||
}
|
||||
if resolved.Session.MaxKeep != 20 {
|
||||
t.Errorf("Resolved.Session.MaxKeep = %d, want 20 (default)", resolved.Session.MaxKeep)
|
||||
}
|
||||
if resolved.Router.Prefer != "" {
|
||||
t.Errorf("Resolved.Router.Prefer = %q, want empty (no default)", resolved.Router.Prefer)
|
||||
}
|
||||
}
|
||||
|
||||
// TestResolve_PreservesExplicitValues verifies that explicit user-set
|
||||
// values (non-nil pointers) survive resolution untouched.
|
||||
func TestResolve_PreservesExplicitValues(t *testing.T) {
|
||||
cfg := &Config{
|
||||
Provider: ProviderSection{
|
||||
MaxTokens: i64p(16384),
|
||||
Temperature: fp64(0.7),
|
||||
},
|
||||
Tools: ToolsSection{
|
||||
MaxFileSize: i64p(2 << 20),
|
||||
},
|
||||
Security: SecuritySection{
|
||||
EntropyThreshold: fp64(5.0),
|
||||
RedactHighEntropy: bp(true),
|
||||
},
|
||||
Router: RouterSection{
|
||||
ForceTwoStage: bp(true),
|
||||
Prefer: "cloud",
|
||||
},
|
||||
Session: SessionSection{
|
||||
MaxKeep: ip(50),
|
||||
},
|
||||
}
|
||||
resolved := cfg.Resolved()
|
||||
if resolved.Provider.MaxTokens != 16384 {
|
||||
t.Errorf("Resolved.Provider.MaxTokens = %d, want 16384 (user-set)", resolved.Provider.MaxTokens)
|
||||
}
|
||||
if resolved.Tools.MaxFileSize != 2<<20 {
|
||||
t.Errorf("Resolved.Tools.MaxFileSize = %d, want %d (user-set)", resolved.Tools.MaxFileSize, 2<<20)
|
||||
}
|
||||
if resolved.Security.EntropyThreshold != 5.0 {
|
||||
t.Errorf("Resolved.Security.EntropyThreshold = %v, want 5.0 (user-set)", resolved.Security.EntropyThreshold)
|
||||
}
|
||||
if !resolved.Security.RedactHighEntropy {
|
||||
t.Error("Resolved.Security.RedactHighEntropy = false, want true (user-set)")
|
||||
}
|
||||
if !resolved.Router.ForceTwoStage {
|
||||
t.Error("Resolved.Router.ForceTwoStage = false, want true (user-set)")
|
||||
}
|
||||
if resolved.Router.Prefer != "cloud" {
|
||||
t.Errorf("Resolved.Router.Prefer = %q, want cloud (user-set)", resolved.Router.Prefer)
|
||||
}
|
||||
if resolved.Session.MaxKeep != 50 {
|
||||
t.Errorf("Resolved.Session.MaxKeep = %d, want 50 (user-set)", resolved.Session.MaxKeep)
|
||||
}
|
||||
}
|
||||
|
||||
// TestResolve_ExplicitZeroPreserved verifies that a user who sets
|
||||
// `max_tokens = 0` (a *int64 pointing to 0) gets 0 back from the
|
||||
// resolver — the pointer is non-nil so the default is not substituted.
|
||||
// This is the critical "0 means something the user actually wants"
|
||||
// case the pointer conversion exists to preserve.
|
||||
func TestResolve_ExplicitZeroPreserved(t *testing.T) {
|
||||
cfg := &Config{
|
||||
Provider: ProviderSection{
|
||||
MaxTokens: i64p(0),
|
||||
},
|
||||
Session: SessionSection{
|
||||
MaxKeep: ip(0),
|
||||
},
|
||||
}
|
||||
resolved := cfg.Resolved()
|
||||
if resolved.Provider.MaxTokens != 0 {
|
||||
t.Errorf("Resolved.Provider.MaxTokens = %d, want 0 (explicit zero)", resolved.Provider.MaxTokens)
|
||||
}
|
||||
if resolved.Session.MaxKeep != 0 {
|
||||
t.Errorf("Resolved.Session.MaxKeep = %d, want 0 (explicit zero)", resolved.Session.MaxKeep)
|
||||
}
|
||||
}
|
||||
|
||||
// TestResolve_HookFailOpen_NilDefaultsToFalse verifies that a hook
|
||||
// with no `fail_open` key gets the documented default (false) in
|
||||
// resolution. The HookConfig doc-comment says default is false
|
||||
// ("fail closed" / deny-on-error behaviour).
|
||||
func TestResolve_HookFailOpen_NilDefaultsToFalse(t *testing.T) {
|
||||
cfg := &Config{
|
||||
Hooks: []HookConfig{
|
||||
{Name: "log-tools", Event: "pre_tool_use", Type: "command", Exec: "/bin/true"},
|
||||
},
|
||||
}
|
||||
resolved := cfg.Resolved()
|
||||
if len(resolved.Hooks) != 1 {
|
||||
t.Fatalf("len(Resolved.Hooks) = %d, want 1", len(resolved.Hooks))
|
||||
}
|
||||
if resolved.Hooks[0].FailOpen {
|
||||
t.Error("Resolved.Hooks[0].FailOpen = true, want false (default)")
|
||||
}
|
||||
if resolved.Hooks[0].Name != "log-tools" {
|
||||
t.Errorf("Resolved.Hooks[0].Name = %q, want log-tools", resolved.Hooks[0].Name)
|
||||
}
|
||||
if resolved.Hooks[0].Exec != "/bin/true" {
|
||||
t.Errorf("Resolved.Hooks[0].Exec = %q, want /bin/true", resolved.Hooks[0].Exec)
|
||||
}
|
||||
}
|
||||
|
||||
// TestResolve_HookFailOpen_ExplicitTrue verifies that a hook with
|
||||
// `fail_open = true` in TOML keeps true in resolution.
|
||||
func TestResolve_HookFailOpen_ExplicitTrue(t *testing.T) {
|
||||
cfg := &Config{
|
||||
Hooks: []HookConfig{
|
||||
{Name: "dangerous", Event: "pre_tool_use", Type: "command", Exec: "/bin/true", FailOpen: bp(true)},
|
||||
},
|
||||
}
|
||||
resolved := cfg.Resolved()
|
||||
if !resolved.Hooks[0].FailOpen {
|
||||
t.Error("Resolved.Hooks[0].FailOpen = false, want true (explicit)")
|
||||
}
|
||||
}
|
||||
|
||||
// TestResolve_NonPointerFieldsPassthrough verifies that string/slice
|
||||
// fields on the mirror are passed through from the source Config
|
||||
// without default substitution. Only the pointer-converted fields
|
||||
// get the resolver treatment; the rest are read directly via cfg.X.
|
||||
func TestResolve_NonPointerFieldsPassthrough(t *testing.T) {
|
||||
cfg := &Config{
|
||||
Provider: ProviderSection{
|
||||
Default: "anthropic",
|
||||
Model: "claude-opus-4-7",
|
||||
},
|
||||
Security: SecuritySection{
|
||||
EntropySafelist: []string{"uuid", "sha_hex"},
|
||||
},
|
||||
}
|
||||
resolved := cfg.Resolved()
|
||||
if resolved.Provider.Default != "anthropic" {
|
||||
t.Errorf("Resolved.Provider.Default = %q, want anthropic", resolved.Provider.Default)
|
||||
}
|
||||
if resolved.Provider.Model != "claude-opus-4-7" {
|
||||
t.Errorf("Resolved.Provider.Model = %q, want claude-opus-4-7", resolved.Provider.Model)
|
||||
}
|
||||
if len(resolved.Security.EntropySafelist) != 2 ||
|
||||
resolved.Security.EntropySafelist[0] != "uuid" {
|
||||
t.Errorf("Resolved.Security.EntropySafelist = %v, want [uuid sha_hex]", resolved.Security.EntropySafelist)
|
||||
}
|
||||
}
|
||||
|
||||
// TestResolve_SLMSection_StartupTimeoutDefaultsTo5s verifies that
|
||||
// the SLM section's pointer-converted Duration fields (added in the
|
||||
// 2026-06-04 follow-up to Phase 1) get the documented defaults.
|
||||
// StartupTimeout's default is 5s (the llamafile first-launch budget);
|
||||
// ClassifyTimeout's default is 0 (which the SLM layer maps to its
|
||||
// own 15s budget).
|
||||
func TestResolve_SLMSection_StartupTimeoutDefaultsTo5s(t *testing.T) {
|
||||
cfg := &Config{} // every pointer nil
|
||||
resolved := cfg.Resolved()
|
||||
|
||||
if resolved.SLM.StartupTimeout != 5*time.Second {
|
||||
t.Errorf("Resolved.SLM.StartupTimeout = %v, want 5s (default)", resolved.SLM.StartupTimeout)
|
||||
}
|
||||
if resolved.SLM.ClassifyTimeout != 0 {
|
||||
t.Errorf("Resolved.SLM.ClassifyTimeout = %v, want 0 (default — use SLM-layer 15s)", resolved.SLM.ClassifyTimeout)
|
||||
}
|
||||
}
|
||||
|
||||
// TestResolve_SLMSection_ExplicitDurationsPreserved verifies that
|
||||
// user-set Duration values survive resolution untouched.
|
||||
func TestResolve_SLMSection_ExplicitDurationsPreserved(t *testing.T) {
|
||||
startup := Duration(30 * time.Second)
|
||||
classify := Duration(45 * time.Second)
|
||||
cfg := &Config{
|
||||
SLM: SLMSection{
|
||||
StartupTimeout: &startup,
|
||||
ClassifyTimeout: &classify,
|
||||
},
|
||||
}
|
||||
resolved := cfg.Resolved()
|
||||
if resolved.SLM.StartupTimeout != 30*time.Second {
|
||||
t.Errorf("Resolved.SLM.StartupTimeout = %v, want 30s (user-set)", resolved.SLM.StartupTimeout)
|
||||
}
|
||||
if resolved.SLM.ClassifyTimeout != 45*time.Second {
|
||||
t.Errorf("Resolved.SLM.ClassifyTimeout = %v, want 45s (user-set)", resolved.SLM.ClassifyTimeout)
|
||||
}
|
||||
}
|
||||
|
||||
// TestResolve_SLMSection_ExplicitZeroPreserved verifies that
|
||||
// *Duration(0) (the documented "use built-in default" sentinel for
|
||||
// both fields) is preserved as 0 in the resolved view.
|
||||
func TestResolve_SLMSection_ExplicitZeroPreserved(t *testing.T) {
|
||||
startup := Duration(0)
|
||||
classify := Duration(0)
|
||||
cfg := &Config{
|
||||
SLM: SLMSection{
|
||||
StartupTimeout: &startup,
|
||||
ClassifyTimeout: &classify,
|
||||
},
|
||||
}
|
||||
resolved := cfg.Resolved()
|
||||
if resolved.SLM.StartupTimeout != 0 {
|
||||
t.Errorf("Resolved.SLM.StartupTimeout = %v, want 0 (explicit zero)", resolved.SLM.StartupTimeout)
|
||||
}
|
||||
if resolved.SLM.ClassifyTimeout != 0 {
|
||||
t.Errorf("Resolved.SLM.ClassifyTimeout = %v, want 0 (explicit zero)", resolved.SLM.ClassifyTimeout)
|
||||
}
|
||||
}
|
||||
|
||||
// TestResolve_ProjectRegistryDefaultsToTrue verifies the
|
||||
// Phase 2 mirror: nil pointer → default (true, registry
|
||||
// enabled). Preserves the v0.3.x "always record" behavior.
|
||||
func TestResolve_ProjectRegistryDefaultsToTrue(t *testing.T) {
|
||||
cfg := &Config{}
|
||||
resolved := cfg.Resolved()
|
||||
if !resolved.ProjectRegistry {
|
||||
t.Errorf("Resolved.ProjectRegistry = false, want true (default)")
|
||||
}
|
||||
}
|
||||
|
||||
// TestResolve_ProjectRegistry_ExplicitFalse verifies that a
|
||||
// user who sets `[config].project_registry = false` gets
|
||||
// false in the resolved view.
|
||||
func TestResolve_ProjectRegistry_ExplicitFalse(t *testing.T) {
|
||||
v := false
|
||||
cfg := &Config{
|
||||
Settings: SettingsSection{ProjectRegistry: &v},
|
||||
}
|
||||
resolved := cfg.Resolved()
|
||||
if resolved.ProjectRegistry {
|
||||
t.Errorf("Resolved.ProjectRegistry = true, want false (explicit opt-out)")
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,298 @@
|
||||
package config
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"fmt"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"time"
|
||||
|
||||
"github.com/BurntSushi/toml"
|
||||
)
|
||||
|
||||
// UpgradeResult is what Upgrade returns: a description of what
|
||||
// changed, plus a human-readable diff the CLI can print for the
|
||||
// user to verify. BackupPath is empty when no work was done.
|
||||
type UpgradeResult struct {
|
||||
Changed bool
|
||||
BackupPath string
|
||||
Diff string
|
||||
}
|
||||
|
||||
// Upgrade reads the config at path, applies the cleaning pass
|
||||
// (drops fields whose value matches the resolved default, leaves
|
||||
// explicit-zero pointer fields alone), and atomically writes the
|
||||
// cleaned form to the same path. The original is preserved at
|
||||
// `<path>.bak-YYYYMMDD-HHMMSS`.
|
||||
//
|
||||
// Single-file mode only — `--all-projects` is deferred to the
|
||||
// Phase 2 project registry work in the 2026-05-24 config-
|
||||
// migration plan.
|
||||
//
|
||||
// The cleaning rules per field type:
|
||||
//
|
||||
// - Pointer-converted fields: drop (set to nil) iff the
|
||||
// resolved value equals the resolved default. Explicit-zero
|
||||
// pointer values that differ from the default are kept.
|
||||
//
|
||||
// - Non-pointer string / map / slice fields: encoder's
|
||||
// `omitempty` already drops Go-zero values on rewrite. The
|
||||
// cleaner doesn't need to touch them.
|
||||
//
|
||||
// - Non-pointer numeric / bool fields: same as non-pointer
|
||||
// string — encoder drops Go-zero via `omitempty`. The
|
||||
// documented 0-sentinel pattern (e.g. `TUI.Vim`, `Bandit`)
|
||||
// intentionally has Go zero == default, so this is correct.
|
||||
//
|
||||
// The contract: the resolved view of the cleaned file is
|
||||
// byte-identical to the resolved view of the original (modulo
|
||||
// cosmetic whitespace). Idempotency test in upgrade_test.go
|
||||
// asserts this.
|
||||
func Upgrade(path string) (UpgradeResult, error) {
|
||||
original, err := os.ReadFile(path)
|
||||
if err != nil {
|
||||
return UpgradeResult{}, fmt.Errorf("read config: %w", err)
|
||||
}
|
||||
|
||||
var src Config
|
||||
if _, decErr := toml.Decode(string(original), &src); decErr != nil {
|
||||
return UpgradeResult{}, fmt.Errorf("decode config: %w", decErr)
|
||||
}
|
||||
|
||||
// Encode the *original* (uncleaned) state for diff/compare
|
||||
// BEFORE clean() mutates the struct in place.
|
||||
var beforeBuf bytes.Buffer
|
||||
if err := toml.NewEncoder(&beforeBuf).Encode(&src); err != nil {
|
||||
return UpgradeResult{}, fmt.Errorf("encode before: %w", err)
|
||||
}
|
||||
|
||||
clean(&src)
|
||||
|
||||
// Encode the cleaned state.
|
||||
var afterBuf bytes.Buffer
|
||||
if err := toml.NewEncoder(&afterBuf).Encode(&src); err != nil {
|
||||
return UpgradeResult{}, fmt.Errorf("encode after: %w", err)
|
||||
}
|
||||
before := beforeBuf.Bytes()
|
||||
after := afterBuf.Bytes()
|
||||
|
||||
if bytes.Equal(before, after) {
|
||||
return UpgradeResult{Changed: false}, nil
|
||||
}
|
||||
|
||||
// Atomic two-step write: rename original to .bak-<timestamp>,
|
||||
// then atomic-write the new content to the original path. If
|
||||
// the rename fails or the new write fails, the original is
|
||||
// preserved on disk (we never delete it before the new
|
||||
// content is durably committed).
|
||||
backupPath, err := backupPathFor(path)
|
||||
if err != nil {
|
||||
return UpgradeResult{}, err
|
||||
}
|
||||
if err := os.Rename(path, backupPath); err != nil {
|
||||
return UpgradeResult{}, fmt.Errorf("rename original to backup: %w", err)
|
||||
}
|
||||
if err := writeAtomicBytes(path, after); err != nil {
|
||||
// Best-effort restore: the original is at backupPath,
|
||||
// the user can recover. But the rename already moved it,
|
||||
// so the canonical path is gone. Try to put the backup
|
||||
// back so the user's config isn't lost.
|
||||
_ = os.Rename(backupPath, path)
|
||||
return UpgradeResult{}, fmt.Errorf("write cleaned config: %w", err)
|
||||
}
|
||||
|
||||
return UpgradeResult{
|
||||
Changed: true,
|
||||
BackupPath: backupPath,
|
||||
Diff: lineDiff(string(before), string(after)),
|
||||
}, nil
|
||||
}
|
||||
|
||||
// clean returns a new Config with pointer-converted fields
|
||||
// nulled where the value matches the resolved default. Non-
|
||||
// pointer fields are passed through unchanged — the encoder's
|
||||
// `omitempty` handles their Go-zero cases on write.
|
||||
//
|
||||
// `clean` mutates *Config.X by setting it to nil for fields
|
||||
// that match the default. It does not allocate a fresh Config
|
||||
// because the pointer fields reference shared memory between
|
||||
// sections (e.g. `cfg.Provider.MaxTokens` and
|
||||
// `Defaults().Provider.MaxTokens` are both *int64). Returning
|
||||
// the same struct with selective nulling keeps the data flow
|
||||
// obvious.
|
||||
func clean(cfg *Config) *Config {
|
||||
d := Defaults()
|
||||
resolvedSrc := cfg.Resolved()
|
||||
resolvedDef := d.Resolved()
|
||||
|
||||
// Provider.MaxTokens
|
||||
if cfg.Provider.MaxTokens != nil && resolvedSrc.Provider.MaxTokens == resolvedDef.Provider.MaxTokens {
|
||||
cfg.Provider.MaxTokens = nil
|
||||
}
|
||||
|
||||
// Tools.MaxFileSize
|
||||
if cfg.Tools.MaxFileSize != nil && resolvedSrc.Tools.MaxFileSize == resolvedDef.Tools.MaxFileSize {
|
||||
cfg.Tools.MaxFileSize = nil
|
||||
}
|
||||
|
||||
// Security.EntropyThreshold
|
||||
if cfg.Security.EntropyThreshold != nil && resolvedSrc.Security.EntropyThreshold == resolvedDef.Security.EntropyThreshold {
|
||||
cfg.Security.EntropyThreshold = nil
|
||||
}
|
||||
// Security.RedactHighEntropy
|
||||
if cfg.Security.RedactHighEntropy != nil && resolvedSrc.Security.RedactHighEntropy == resolvedDef.Security.RedactHighEntropy {
|
||||
cfg.Security.RedactHighEntropy = nil
|
||||
}
|
||||
|
||||
// Router.ForceTwoStage
|
||||
if cfg.Router.ForceTwoStage != nil && resolvedSrc.Router.ForceTwoStage == resolvedDef.Router.ForceTwoStage {
|
||||
cfg.Router.ForceTwoStage = nil
|
||||
}
|
||||
|
||||
// Session.MaxKeep
|
||||
if cfg.Session.MaxKeep != nil && resolvedSrc.Session.MaxKeep == resolvedDef.Session.MaxKeep {
|
||||
cfg.Session.MaxKeep = nil
|
||||
}
|
||||
|
||||
// SLM.StartupTimeout / SLM.ClassifyTimeout
|
||||
if cfg.SLM.StartupTimeout != nil && resolvedSrc.SLM.StartupTimeout == resolvedDef.SLM.StartupTimeout {
|
||||
cfg.SLM.StartupTimeout = nil
|
||||
}
|
||||
if cfg.SLM.ClassifyTimeout != nil && resolvedSrc.SLM.ClassifyTimeout == resolvedDef.SLM.ClassifyTimeout {
|
||||
cfg.SLM.ClassifyTimeout = nil
|
||||
}
|
||||
// SLM.RegisterAsArm: default is true; only null when
|
||||
// explicitly set to true (the default-true case).
|
||||
if cfg.SLM.RegisterAsArm != nil && *cfg.SLM.RegisterAsArm == resolvedDef.SLM.RegisterAsArm {
|
||||
cfg.SLM.RegisterAsArm = nil
|
||||
}
|
||||
|
||||
// HookConfig.FailOpen per entry
|
||||
for i := range cfg.Hooks {
|
||||
if cfg.Hooks[i].FailOpen != nil && !resolvedSrc.Hooks[i].FailOpen {
|
||||
// Default for FailOpen is false; null when explicitly false.
|
||||
cfg.Hooks[i].FailOpen = nil
|
||||
}
|
||||
}
|
||||
|
||||
return cfg
|
||||
}
|
||||
|
||||
// backupPathFor returns a deterministic timestamped backup path.
|
||||
// Uses the local-time YYYYMMDD-HHMMSS format the original plan
|
||||
// specified, with second-level resolution. Collisions within the
|
||||
// same second are possible (e.g. rapid re-runs) but the
|
||||
// idempotency test exercises the no-second-backup case, so a
|
||||
// collision would still be visible to the user.
|
||||
func backupPathFor(path string) (string, error) {
|
||||
t := time.Now()
|
||||
suffix := t.Format("20060102-150405")
|
||||
return fmt.Sprintf("%s.bak-%s", path, suffix), nil
|
||||
}
|
||||
|
||||
// writeAtomicBytes writes the given bytes to path via temp file
|
||||
// + rename. Used by Upgrade (which has already produced the
|
||||
// bytes) and is a more general version of writeAtomicTOML.
|
||||
func writeAtomicBytes(path string, data []byte) error {
|
||||
dir := filepath.Dir(path)
|
||||
tmp, err := os.CreateTemp(dir, filepath.Base(path)+".tmp-*")
|
||||
if err != nil {
|
||||
return fmt.Errorf("create temp: %w", err)
|
||||
}
|
||||
tmpName := tmp.Name()
|
||||
cleanup := func() { _ = os.Remove(tmpName) }
|
||||
|
||||
if _, err := tmp.Write(data); err != nil {
|
||||
_ = tmp.Close()
|
||||
cleanup()
|
||||
return fmt.Errorf("write temp: %w", err)
|
||||
}
|
||||
if err := tmp.Sync(); err != nil {
|
||||
_ = tmp.Close()
|
||||
cleanup()
|
||||
return fmt.Errorf("sync temp: %w", err)
|
||||
}
|
||||
if err := tmp.Close(); err != nil {
|
||||
cleanup()
|
||||
return fmt.Errorf("close temp: %w", err)
|
||||
}
|
||||
if err := os.Rename(tmpName, path); err != nil {
|
||||
cleanup()
|
||||
return fmt.Errorf("rename temp: %w", err)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// lineDiff returns a simple line-by-line diff between before and
|
||||
// after. Lines removed from before are prefixed with `-`, lines
|
||||
// added in after are prefixed with `+`, unchanged lines are
|
||||
// prefixed with ` ` (space). Header lines give the file lengths.
|
||||
//
|
||||
// Not a true Myers / Hunt–Szymanski diff — a long edit can
|
||||
// produce noisy output. Adequate for the gnoma use case where
|
||||
// config files are small (tens of lines) and the user wants
|
||||
// visual confirmation that the cleaning is doing the right
|
||||
// thing. If a more sophisticated diff is ever needed,
|
||||
// `github.com/pmezard/go-difflib` is already a transitive dep
|
||||
// (see go.sum) and can be vendored.
|
||||
func lineDiff(before, after string) string {
|
||||
var b bytes.Buffer
|
||||
b.WriteString(fmt.Sprintf("--- before (%d bytes)\n", len(before)))
|
||||
b.WriteString(fmt.Sprintf("+++ after (%d bytes)\n", len(after)))
|
||||
bs := splitLines(before)
|
||||
as := splitLines(after)
|
||||
|
||||
// Naive: walk both, mark removed/added/changed. We do a
|
||||
// simple longest-common-subsequence via a small set, since
|
||||
// config files are small. For each line in before, find
|
||||
// the first matching line in after; emit `-` for the
|
||||
// unmatched prefix and `+` for the new prefix.
|
||||
i, j := 0, 0
|
||||
for i < len(bs) || j < len(as) {
|
||||
switch {
|
||||
case i < len(bs) && j < len(as) && bs[i] == as[j]:
|
||||
fmt.Fprintf(&b, " %s\n", bs[i])
|
||||
i++
|
||||
j++
|
||||
case j < len(as) && (i == len(bs) || !contains(bs[i:], as[j])):
|
||||
fmt.Fprintf(&b, "+ %s\n", as[j])
|
||||
j++
|
||||
case i < len(bs):
|
||||
fmt.Fprintf(&b, "- %s\n", bs[i])
|
||||
i++
|
||||
}
|
||||
}
|
||||
return b.String()
|
||||
}
|
||||
|
||||
// splitLines returns the lines of s, including any trailing
|
||||
// empty line if s ends in '\n'. The result is suitable for
|
||||
// line-by-line diffing.
|
||||
func splitLines(s string) []string {
|
||||
if s == "" {
|
||||
return nil
|
||||
}
|
||||
out := []string{}
|
||||
start := 0
|
||||
for i := 0; i < len(s); i++ {
|
||||
if s[i] == '\n' {
|
||||
out = append(out, s[start:i])
|
||||
start = i + 1
|
||||
}
|
||||
}
|
||||
if start < len(s) {
|
||||
out = append(out, s[start:])
|
||||
}
|
||||
return out
|
||||
}
|
||||
|
||||
// contains reports whether v appears in s. Used by lineDiff to
|
||||
// detect a "moved" line.
|
||||
func contains(s []string, v string) bool {
|
||||
for _, x := range s {
|
||||
if x == v {
|
||||
return true
|
||||
}
|
||||
}
|
||||
return false
|
||||
}
|
||||
@@ -0,0 +1,309 @@
|
||||
package config
|
||||
|
||||
import (
|
||||
"os"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
"testing"
|
||||
"time"
|
||||
)
|
||||
|
||||
// TestUpgrade_DropsPointerFieldAtDefault verifies the core
|
||||
// cleaning semantic for pointer-converted fields: a file
|
||||
// containing `max_tokens = 8192` (the documented default, user
|
||||
// explicitly set to it) gets the field nulled in the rewritten
|
||||
// file. The cleaner compares resolved values; matching the
|
||||
// default means the field is dropped.
|
||||
//
|
||||
// Non-pointer string fields (like `mode = ""`) are dropped
|
||||
// automatically by the encoder's `omitempty` on the
|
||||
// read+rewrite cycle, so they don't need the cleaner's help.
|
||||
// This test focuses on the pointer-converted case that the
|
||||
// cleaner was designed for.
|
||||
func TestUpgrade_DropsPointerFieldAtDefault(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "config.toml")
|
||||
|
||||
original := "[provider]\nmax_tokens = 8192\n"
|
||||
if err := os.WriteFile(path, []byte(original), 0o644); err != nil {
|
||||
t.Fatalf("seed: %v", err)
|
||||
}
|
||||
|
||||
res, err := Upgrade(path)
|
||||
if err != nil {
|
||||
t.Fatalf("Upgrade: %v", err)
|
||||
}
|
||||
if !res.Changed {
|
||||
t.Errorf("Upgrade.Changed = false, want true (max_tokens at default should be dropped)")
|
||||
}
|
||||
|
||||
got, err := os.ReadFile(path)
|
||||
if err != nil {
|
||||
t.Fatalf("read upgraded: %v", err)
|
||||
}
|
||||
body := string(got)
|
||||
|
||||
if strings.Contains(body, "max_tokens") {
|
||||
t.Errorf("max_tokens at default not dropped, got:\n%s", body)
|
||||
}
|
||||
if strings.Contains(body, "[provider]") {
|
||||
t.Errorf("[provider] block should be omitted after cleaning, got:\n%s", body)
|
||||
}
|
||||
}
|
||||
|
||||
// TestUpgrade_KeepsExplicitUserValues verifies that user-set
|
||||
// non-default values survive the cleaning untouched.
|
||||
func TestUpgrade_KeepsExplicitUserValues(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "config.toml")
|
||||
|
||||
original := `[provider]
|
||||
default = "anthropic"
|
||||
max_tokens = 16384
|
||||
|
||||
[permission]
|
||||
mode = "deny"
|
||||
`
|
||||
if err := os.WriteFile(path, []byte(original), 0o644); err != nil {
|
||||
t.Fatalf("seed: %v", err)
|
||||
}
|
||||
|
||||
if _, err := Upgrade(path); err != nil {
|
||||
t.Fatalf("Upgrade: %v", err)
|
||||
}
|
||||
|
||||
got, _ := os.ReadFile(path)
|
||||
body := string(got)
|
||||
|
||||
for _, want := range []string{
|
||||
`default = "anthropic"`,
|
||||
`max_tokens = 16384`,
|
||||
`mode = "deny"`,
|
||||
} {
|
||||
if !strings.Contains(body, want) {
|
||||
t.Errorf("cleaned file missing %q, got:\n%s", want, body)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// TestUpgrade_KeepsExplicitZeroPointerFields verifies the
|
||||
// pointer-conversion contract: a user who sets `*int64(0)`
|
||||
// explicitly (resolved to 0, which differs from the default
|
||||
// 8192) keeps the field in the cleaned file. This is the
|
||||
// "explicit zero preserved" case the Phase 1 hybrid exists for.
|
||||
func TestUpgrade_KeepsExplicitZeroPointerFields(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "config.toml")
|
||||
|
||||
original := `[provider]
|
||||
max_tokens = 0
|
||||
`
|
||||
if err := os.WriteFile(path, []byte(original), 0o644); err != nil {
|
||||
t.Fatalf("seed: %v", err)
|
||||
}
|
||||
|
||||
if _, err := Upgrade(path); err != nil {
|
||||
t.Fatalf("Upgrade: %v", err)
|
||||
}
|
||||
|
||||
got, _ := os.ReadFile(path)
|
||||
body := string(got)
|
||||
|
||||
if !strings.Contains(body, "max_tokens = 0") {
|
||||
t.Errorf("explicit zero max_tokens = 0 was dropped, got:\n%s", body)
|
||||
}
|
||||
}
|
||||
|
||||
// TestUpgrade_BackupFileCreated verifies the atomic two-step
|
||||
// write: the original is renamed to `<path>.bak-YYYYMMDD-HHMMSS`
|
||||
// and the cleaned content lands at the original path. The
|
||||
// timestamp suffix is deterministic enough to pattern-match.
|
||||
func TestUpgrade_BackupFileCreated(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "config.toml")
|
||||
|
||||
// Use a pointer-converted field at the default so the cleaner
|
||||
// actually mutates the struct (and Changed becomes true).
|
||||
original := "[provider]\nmax_tokens = 8192\n"
|
||||
if err := os.WriteFile(path, []byte(original), 0o644); err != nil {
|
||||
t.Fatalf("seed: %v", err)
|
||||
}
|
||||
|
||||
res, err := Upgrade(path)
|
||||
if err != nil {
|
||||
t.Fatalf("Upgrade: %v", err)
|
||||
}
|
||||
if !res.Changed {
|
||||
t.Skip("no change, can't test backup creation")
|
||||
}
|
||||
if res.BackupPath == "" {
|
||||
t.Errorf("Upgrade.BackupPath = empty, want non-empty")
|
||||
}
|
||||
if !strings.HasPrefix(res.BackupPath, path+".bak-") {
|
||||
t.Errorf("BackupPath = %q, want prefix %q", res.BackupPath, path+".bak-")
|
||||
}
|
||||
backup, err := os.ReadFile(res.BackupPath)
|
||||
if err != nil {
|
||||
t.Fatalf("read backup: %v", err)
|
||||
}
|
||||
if string(backup) != original {
|
||||
t.Errorf("backup content = %q, want %q", backup, original)
|
||||
}
|
||||
}
|
||||
|
||||
// TestUpgrade_Idempotent verifies the core promise: running
|
||||
// upgrade twice on the same file produces a no-op the second
|
||||
// time. No second backup is created; the file content is
|
||||
// unchanged; the result reports Changed=false on the second run.
|
||||
func TestUpgrade_Idempotent(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "config.toml")
|
||||
|
||||
// Mix: one explicit user value (default = "anthropic") and
|
||||
// one pointer-converted field at the default (max_tokens = 8192).
|
||||
// The cleaner drops the max_tokens; the user value is kept.
|
||||
original := "[provider]\ndefault = \"anthropic\"\nmax_tokens = 8192\n"
|
||||
if err := os.WriteFile(path, []byte(original), 0o644); err != nil {
|
||||
t.Fatalf("seed: %v", err)
|
||||
}
|
||||
|
||||
first, err := Upgrade(path)
|
||||
if err != nil {
|
||||
t.Fatalf("first Upgrade: %v", err)
|
||||
}
|
||||
if !first.Changed {
|
||||
t.Errorf("first Upgrade.Changed = false, want true")
|
||||
}
|
||||
|
||||
second, err := Upgrade(path)
|
||||
if err != nil {
|
||||
t.Fatalf("second Upgrade: %v", err)
|
||||
}
|
||||
if second.Changed {
|
||||
t.Errorf("second Upgrade.Changed = true, want false (idempotent)")
|
||||
}
|
||||
if second.BackupPath != "" {
|
||||
t.Errorf("second Upgrade.BackupPath = %q, want empty (no second backup)", second.BackupPath)
|
||||
}
|
||||
}
|
||||
|
||||
// TestUpgrade_NoChangesOnAlreadyCleanFile verifies the no-op
|
||||
// case: a file that already has only user-set non-default
|
||||
// values produces Changed=false and no backup. This is the
|
||||
// baseline — the user runs upgrade-config and gets told
|
||||
// "nothing to do".
|
||||
func TestUpgrade_NoChangesOnAlreadyCleanFile(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "config.toml")
|
||||
|
||||
clean := "[provider]\ndefault = \"anthropic\"\n"
|
||||
if err := os.WriteFile(path, []byte(clean), 0o644); err != nil {
|
||||
t.Fatalf("seed: %v", err)
|
||||
}
|
||||
|
||||
res, err := Upgrade(path)
|
||||
if err != nil {
|
||||
t.Fatalf("Upgrade: %v", err)
|
||||
}
|
||||
if res.Changed {
|
||||
t.Errorf("Upgrade.Changed = true on already-clean file")
|
||||
}
|
||||
if res.BackupPath != "" {
|
||||
t.Errorf("Upgrade.BackupPath = %q, want empty", res.BackupPath)
|
||||
}
|
||||
}
|
||||
|
||||
// TestUpgrade_DiffPopulatedWhenChanged verifies the human-readable
|
||||
// diff is populated whenever the file changed. CLI prints this
|
||||
// for the user to verify the cleaning is doing the right thing.
|
||||
func TestUpgrade_DiffPopulatedWhenChanged(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "config.toml")
|
||||
|
||||
// Use a pointer-converted field at the default so Changed=true.
|
||||
if err := os.WriteFile(path, []byte("[provider]\nmax_tokens = 8192\n"), 0o644); err != nil {
|
||||
t.Fatalf("seed: %v", err)
|
||||
}
|
||||
|
||||
res, err := Upgrade(path)
|
||||
if err != nil {
|
||||
t.Fatalf("Upgrade: %v", err)
|
||||
}
|
||||
if !res.Changed {
|
||||
t.Skip("no change, can't test diff content")
|
||||
}
|
||||
if res.Diff == "" {
|
||||
t.Errorf("Upgrade.Diff = empty, want non-empty when Changed=true")
|
||||
}
|
||||
if !strings.Contains(res.Diff, "max_tokens") {
|
||||
t.Errorf("Diff does not mention the changed field, got:\n%s", res.Diff)
|
||||
}
|
||||
}
|
||||
|
||||
// TestUpgrade_PreservesDurationFields verifies the
|
||||
// 2026-06-04 Caveat 1 fix interacts correctly with the cleaner:
|
||||
// a user-set Duration (e.g. classify_timeout = "20s") is kept
|
||||
// because it's not the default (the default is *Duration(0) for
|
||||
// ClassifyTimeout, mapped to time.Duration(0) at the resolver).
|
||||
func TestUpgrade_PreservesDurationFields(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "config.toml")
|
||||
|
||||
original := "[slm]\nclassify_timeout = \"20s\"\n"
|
||||
if err := os.WriteFile(path, []byte(original), 0o644); err != nil {
|
||||
t.Fatalf("seed: %v", err)
|
||||
}
|
||||
|
||||
if _, err := Upgrade(path); err != nil {
|
||||
t.Fatalf("Upgrade: %v", err)
|
||||
}
|
||||
|
||||
got, _ := os.ReadFile(path)
|
||||
body := string(got)
|
||||
|
||||
if !strings.Contains(body, "classify_timeout") {
|
||||
t.Errorf("user-set Duration was dropped, got:\n%s", body)
|
||||
}
|
||||
}
|
||||
|
||||
// TestUpgrade_KeepsExplicitZeroDuration documents the *opposite*
|
||||
// of the "drops" cases: a file with `startup_timeout = 0` (the
|
||||
// previous zero-spam from the pre-Caveat-1 int64 encoder) is
|
||||
// KEPT, because the resolved value via *Duration is 0 which
|
||||
// differs from the documented default of 5s. The user's
|
||||
// explicit-zero is preserved — this is the "explicit zero"
|
||||
// contract the pointer-conversion exists for.
|
||||
func TestUpgrade_KeepsExplicitZeroDuration(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "config.toml")
|
||||
|
||||
original := "[slm]\nstartup_timeout = 0\n"
|
||||
if err := os.WriteFile(path, []byte(original), 0o644); err != nil {
|
||||
t.Fatalf("seed: %v", err)
|
||||
}
|
||||
|
||||
if _, err := Upgrade(path); err != nil {
|
||||
t.Fatalf("Upgrade: %v", err)
|
||||
}
|
||||
|
||||
got, _ := os.ReadFile(path)
|
||||
body := string(got)
|
||||
|
||||
if !strings.Contains(body, "startup_timeout") {
|
||||
t.Errorf("startup_timeout was dropped (expected kept; resolved 0 != default 5s), got:\n%s", body)
|
||||
}
|
||||
_ = time.Second
|
||||
}
|
||||
|
||||
// TestUpgrade_NonexistentFileIsError verifies the input-validation
|
||||
// path. A missing source file is a user error, not a silent
|
||||
// success.
|
||||
func TestUpgrade_NonexistentFileIsError(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "nonexistent.toml")
|
||||
|
||||
_, err := Upgrade(path)
|
||||
if err == nil {
|
||||
t.Fatal("Upgrade on missing file succeeded, want error")
|
||||
}
|
||||
}
|
||||
+67
-29
@@ -22,24 +22,33 @@ func SetGlobalConfig(key, value string) error {
|
||||
}
|
||||
|
||||
func setConfig(path, key, value string) error {
|
||||
allowed := map[string]bool{
|
||||
"provider.default": true,
|
||||
"provider.model": true,
|
||||
"permission.mode": true,
|
||||
"slm.model_url": true,
|
||||
"slm.enabled": true,
|
||||
"slm.data_dir": true,
|
||||
"tui.theme": true,
|
||||
"tui.vim": true,
|
||||
}
|
||||
if !allowed[key] {
|
||||
return fmt.Errorf("unknown config key %q (supported: %s)", key, strings.Join(allowedKeys(), ", "))
|
||||
if !isAllowedKey(key) {
|
||||
return fmt.Errorf("unknown config key %q (supported: %s)", key, strings.Join(AllowedKeys(), ", "))
|
||||
}
|
||||
|
||||
// Load existing config or start fresh
|
||||
// Ensure directory exists before the read so a fresh project
|
||||
// can be created without a parent .gnoma/ in place.
|
||||
if err := os.MkdirAll(filepath.Dir(path), 0o755); err != nil {
|
||||
return fmt.Errorf("create config dir: %w", err)
|
||||
}
|
||||
|
||||
// Read existing config into a zero Config; decode overlays
|
||||
// whatever the user has set so the round-trip preserves their
|
||||
// values. Pointer-converted fields decode as `nil` when the key
|
||||
// is absent and as `*T(...)` when present; omitempty on the
|
||||
// encoder keeps absent fields out of the rewritten file. This
|
||||
// is the fix for the zero-spam silent-corruption bug: a fresh
|
||||
// setConfig call no longer emits the entire zero-valued struct.
|
||||
var cfg Config
|
||||
if data, err := os.ReadFile(path); err == nil {
|
||||
toml.Decode(string(data), &cfg) //nolint:errcheck
|
||||
if _, decErr := toml.Decode(string(data), &cfg); decErr != nil {
|
||||
// Existing file is broken; overwrite it with the
|
||||
// caller's change rather than failing closed. The
|
||||
// user's intent for the broken file is "set this
|
||||
// key" — preserving every other corrupt line is
|
||||
// less useful than a clean write.
|
||||
cfg = Config{}
|
||||
}
|
||||
}
|
||||
if cfg.Provider.APIKeys == nil {
|
||||
cfg.Provider.APIKeys = make(map[string]string)
|
||||
@@ -68,29 +77,58 @@ func setConfig(path, key, value string) error {
|
||||
cfg.TUI.Vim = value == "true"
|
||||
}
|
||||
|
||||
// Ensure directory exists
|
||||
if err := os.MkdirAll(filepath.Dir(path), 0o755); err != nil {
|
||||
return fmt.Errorf("create config dir: %w", err)
|
||||
}
|
||||
return writeAtomicTOML(path, cfg)
|
||||
}
|
||||
|
||||
// Write
|
||||
f, err := os.Create(path)
|
||||
// writeAtomicTOML writes cfg to path via temp-file + rename so a
|
||||
// crash mid-write can never leave a half-written config file at
|
||||
// the canonical path. The temp file lives in the same directory
|
||||
// (so the rename is on the same filesystem) and uses a .tmp-*
|
||||
// suffix that any other reader will skip.
|
||||
func writeAtomicTOML(path string, cfg Config) error {
|
||||
dir := filepath.Dir(path)
|
||||
tmp, err := os.CreateTemp(dir, filepath.Base(path)+".tmp-*")
|
||||
if err != nil {
|
||||
return fmt.Errorf("create config file: %w", err)
|
||||
return fmt.Errorf("create temp config file: %w", err)
|
||||
}
|
||||
enc := toml.NewEncoder(f)
|
||||
encErr := enc.Encode(cfg)
|
||||
closeErr := f.Close()
|
||||
if encErr != nil {
|
||||
return encErr
|
||||
tmpName := tmp.Name()
|
||||
cleanup := func() { _ = os.Remove(tmpName) }
|
||||
|
||||
enc := toml.NewEncoder(tmp)
|
||||
if encErr := enc.Encode(cfg); encErr != nil {
|
||||
_ = tmp.Close()
|
||||
cleanup()
|
||||
return fmt.Errorf("encode config: %w", encErr)
|
||||
}
|
||||
if closeErr != nil {
|
||||
return fmt.Errorf("close config file: %w", closeErr)
|
||||
if err := tmp.Sync(); err != nil {
|
||||
_ = tmp.Close()
|
||||
cleanup()
|
||||
return fmt.Errorf("sync config: %w", err)
|
||||
}
|
||||
if err := tmp.Close(); err != nil {
|
||||
cleanup()
|
||||
return fmt.Errorf("close temp config: %w", err)
|
||||
}
|
||||
if err := os.Rename(tmpName, path); err != nil {
|
||||
cleanup()
|
||||
return fmt.Errorf("rename temp config: %w", err)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func allowedKeys() []string {
|
||||
func isAllowedKey(key string) bool {
|
||||
for _, k := range AllowedKeys() {
|
||||
if k == key {
|
||||
return true
|
||||
}
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
// AllowedKeys returns the list of dotted config keys that
|
||||
// `gnoma config set` accepts. Exported so the CLI subcommand can
|
||||
// present the same list in its help text and validation.
|
||||
func AllowedKeys() []string {
|
||||
return []string{
|
||||
"provider.default", "provider.model", "permission.mode",
|
||||
"slm.model_url", "slm.enabled", "slm.data_dir",
|
||||
|
||||
@@ -0,0 +1,200 @@
|
||||
package config
|
||||
|
||||
import (
|
||||
"os"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
"testing"
|
||||
)
|
||||
|
||||
// TestSetProjectConfig_FreshFileWritesOnlyTheKey verifies the core
|
||||
// fix: a `setConfig` call on a non-existent file writes ONLY the
|
||||
// key the user is setting, with no zero-spam. This is what stops
|
||||
// `gnoma config set provider.default anthropic` from emitting
|
||||
// `permission.mode = ""` and silently shadowing a global setting.
|
||||
//
|
||||
// Regression test for the 2026-05-24 silent-corruption symptom.
|
||||
func TestSetProjectConfig_FreshFileWritesOnlyTheKey(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "config.toml")
|
||||
|
||||
if err := setConfig(path, "provider.default", "anthropic"); err != nil {
|
||||
t.Fatalf("setConfig: %v", err)
|
||||
}
|
||||
|
||||
data, err := os.ReadFile(path)
|
||||
if err != nil {
|
||||
t.Fatalf("read result: %v", err)
|
||||
}
|
||||
body := string(data)
|
||||
|
||||
if !strings.Contains(body, "default = \"anthropic\"") {
|
||||
t.Errorf("result missing the set value, got:\n%s", body)
|
||||
}
|
||||
if strings.Contains(body, "permission") {
|
||||
t.Errorf("result contains [permission] zero-spam, got:\n%s", body)
|
||||
}
|
||||
if strings.Contains(body, "mode") {
|
||||
t.Errorf("result contains 'mode' key (likely zero-spam), got:\n%s", body)
|
||||
}
|
||||
if strings.Contains(body, "max_tokens") {
|
||||
t.Errorf("result contains 'max_tokens' (zero-spam from non-pointer default), got:\n%s", body)
|
||||
}
|
||||
}
|
||||
|
||||
// TestSetProjectConfig_RoundTripPreservesUserValues verifies that
|
||||
// the user's previously-set values survive a second `setConfig` call.
|
||||
// The encoder doesn't drop fields that were in the source.
|
||||
func TestSetProjectConfig_RoundTripPreservesUserValues(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "config.toml")
|
||||
|
||||
if err := setConfig(path, "permission.mode", "deny"); err != nil {
|
||||
t.Fatalf("first setConfig: %v", err)
|
||||
}
|
||||
if err := setConfig(path, "provider.default", "anthropic"); err != nil {
|
||||
t.Fatalf("second setConfig: %v", err)
|
||||
}
|
||||
|
||||
data, _ := os.ReadFile(path)
|
||||
body := string(data)
|
||||
|
||||
if !strings.Contains(body, "default = \"anthropic\"") {
|
||||
t.Errorf("second setConfig lost the new value, got:\n%s", body)
|
||||
}
|
||||
if !strings.Contains(body, "mode = \"deny\"") {
|
||||
t.Errorf("second setConfig lost the prior permission.mode, got:\n%s", body)
|
||||
}
|
||||
}
|
||||
|
||||
// TestSetProjectConfig_ReplacesZeroSpamForSetField verifies the
|
||||
// user-recovery path: a file already polluted with `mode = ""`
|
||||
// zero-spam gets corrected when the user re-sets that key.
|
||||
func TestSetProjectConfig_ReplacesZeroSpamForSetField(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "config.toml")
|
||||
|
||||
// Pre-populate with a zero-spammed value.
|
||||
if err := os.WriteFile(path, []byte("[permission]\nmode = \"\"\n"), 0o644); err != nil {
|
||||
t.Fatalf("seed: %v", err)
|
||||
}
|
||||
|
||||
if err := setConfig(path, "permission.mode", "auto"); err != nil {
|
||||
t.Fatalf("setConfig: %v", err)
|
||||
}
|
||||
|
||||
data, _ := os.ReadFile(path)
|
||||
body := string(data)
|
||||
|
||||
if strings.Contains(body, "mode = \"\"") {
|
||||
t.Errorf("zero-spam mode=\"\" not replaced, got:\n%s", body)
|
||||
}
|
||||
if !strings.Contains(body, "mode = \"auto\"") {
|
||||
t.Errorf("new value not present, got:\n%s", body)
|
||||
}
|
||||
}
|
||||
|
||||
// TestSetProjectConfig_RejectsUnknownKey verifies the allowlist
|
||||
// guard. Unknown keys must error, not silently no-op.
|
||||
func TestSetProjectConfig_RejectsUnknownKey(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "config.toml")
|
||||
|
||||
err := setConfig(path, "not.a.real.key", "x")
|
||||
if err == nil {
|
||||
t.Fatal("expected error for unknown key, got nil")
|
||||
}
|
||||
if !strings.Contains(err.Error(), "unknown config key") {
|
||||
t.Errorf("error %q does not name the bad key", err)
|
||||
}
|
||||
if _, statErr := os.Stat(path); !os.IsNotExist(statErr) {
|
||||
t.Errorf("file was created on rejection: stat err = %v", statErr)
|
||||
}
|
||||
}
|
||||
|
||||
// TestSetProjectConfig_AtomicWriteLeavesNoTempFile verifies that
|
||||
// the write is atomic: after a successful call, no .tmp or similar
|
||||
// file remains in the config directory.
|
||||
func TestSetProjectConfig_AtomicWriteLeavesNoTempFile(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "config.toml")
|
||||
|
||||
if err := setConfig(path, "tui.theme", "dracula"); err != nil {
|
||||
t.Fatalf("setConfig: %v", err)
|
||||
}
|
||||
|
||||
entries, err := os.ReadDir(dir)
|
||||
if err != nil {
|
||||
t.Fatalf("ReadDir: %v", err)
|
||||
}
|
||||
for _, e := range entries {
|
||||
if e.Name() != "config.toml" {
|
||||
t.Errorf("unexpected leftover file: %q", e.Name())
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// TestSetProjectConfig_OmitsEmptyStringField verifies the omitempty
|
||||
// fix at the field level: setting a string field to "" does not
|
||||
// emit the field. This is the layer that stops a user setting
|
||||
// `tui.theme = ""` (or any other empty string) from re-introducing
|
||||
// zero-spam.
|
||||
func TestSetProjectConfig_OmitsEmptyStringField(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "config.toml")
|
||||
|
||||
// tui.theme is whitelisted; setting to empty should be a no-op
|
||||
// on the file's emitted content (or at most, not write the
|
||||
// theme line).
|
||||
if err := setConfig(path, "tui.theme", ""); err != nil {
|
||||
t.Fatalf("setConfig: %v", err)
|
||||
}
|
||||
data, _ := os.ReadFile(path)
|
||||
body := string(data)
|
||||
if strings.Contains(body, "theme") {
|
||||
t.Errorf("empty theme still emitted, got:\n%s", body)
|
||||
}
|
||||
}
|
||||
|
||||
// TestSetProjectConfig_SetsBoolFieldCorrectly verifies that the
|
||||
// whitelisted `tui.vim` boolean (kept as a non-pointer bool per
|
||||
// the plan — the default-equals-false case where the encoder can
|
||||
// skip without losing user intent) round-trips for the `true`
|
||||
// case. The `false` case is the Go zero value, so omitempty drops
|
||||
// it — which matches the user's effective intent.
|
||||
func TestSetProjectConfig_SetsBoolFieldCorrectly(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "config.toml")
|
||||
|
||||
if err := setConfig(path, "tui.vim", "true"); err != nil {
|
||||
t.Fatalf("setConfig: %v", err)
|
||||
}
|
||||
data, _ := os.ReadFile(path)
|
||||
if !strings.Contains(string(data), "vim = true") {
|
||||
t.Errorf("vim=true not present, got:\n%s", data)
|
||||
}
|
||||
}
|
||||
|
||||
// TestSetProjectConfig_SLMEnabledOmitsDurationFields verifies the
|
||||
// 2026-06-04 follow-up fix: setting `slm.enabled = true` on a
|
||||
// fresh file no longer emits `startup_timeout = 0` or
|
||||
// `classify_timeout = 0` zero-spam. Both Duration fields are
|
||||
// pointer-converted (`*Duration`) so the encoder honors
|
||||
// `omitempty` when the pointer is nil.
|
||||
func TestSetProjectConfig_SLMEnabledOmitsDurationFields(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "config.toml")
|
||||
|
||||
if err := setConfig(path, "slm.enabled", "true"); err != nil {
|
||||
t.Fatalf("setConfig: %v", err)
|
||||
}
|
||||
data, _ := os.ReadFile(path)
|
||||
body := string(data)
|
||||
|
||||
if strings.Contains(body, "startup_timeout") {
|
||||
t.Errorf("startup_timeout emitted as zero-spam, got:\n%s", body)
|
||||
}
|
||||
if strings.Contains(body, "classify_timeout") {
|
||||
t.Errorf("classify_timeout emitted as zero-spam, got:\n%s", body)
|
||||
}
|
||||
}
|
||||
@@ -49,7 +49,7 @@ func ParseHookDefs(cfgs []config.HookConfig) ([]HookDef, error) {
|
||||
Command: cmd,
|
||||
Exec: c.Exec,
|
||||
Timeout: timeout,
|
||||
FailOpen: c.FailOpen,
|
||||
FailOpen: c.FailOpen != nil && *c.FailOpen,
|
||||
ToolPattern: toolPattern,
|
||||
}
|
||||
if err := def.Validate(); err != nil {
|
||||
|
||||
@@ -8,6 +8,7 @@ import (
|
||||
)
|
||||
|
||||
func TestParseHookDefs_ValidConfig(t *testing.T) {
|
||||
failOpen := true
|
||||
cfgs := []config.HookConfig{
|
||||
{
|
||||
Name: "log-tools",
|
||||
@@ -15,7 +16,7 @@ func TestParseHookDefs_ValidConfig(t *testing.T) {
|
||||
Type: "command",
|
||||
Exec: "tee -a /tmp/log.jsonl",
|
||||
Timeout: "5s",
|
||||
FailOpen: true,
|
||||
FailOpen: &failOpen,
|
||||
ToolPattern: "bash*",
|
||||
},
|
||||
}
|
||||
|
||||
@@ -105,13 +105,18 @@ func (l *Loader) Load(plugins []Plugin, enabledSet map[string]bool, pins PinStor
|
||||
if execPath != "" && !filepath.IsAbs(execPath) {
|
||||
execPath = filepath.Join(p.Dir, execPath)
|
||||
}
|
||||
var failOpen *bool
|
||||
if h.FailOpen {
|
||||
v := true
|
||||
failOpen = &v
|
||||
}
|
||||
result.Hooks = append(result.Hooks, config.HookConfig{
|
||||
Name: h.Name,
|
||||
Event: h.Event,
|
||||
Type: h.Type,
|
||||
Exec: execPath,
|
||||
Timeout: h.Timeout,
|
||||
FailOpen: h.FailOpen,
|
||||
FailOpen: failOpen,
|
||||
ToolPattern: h.ToolPattern,
|
||||
})
|
||||
}
|
||||
|
||||
@@ -186,6 +186,26 @@ func translateRequest(req provider.Request) oai.ChatCompletionNewParams {
|
||||
params.ReasoningEffort = effortToReasoningEffort(req.Thinking.Level)
|
||||
}
|
||||
|
||||
// Honour ResponseFormat. ollama (via OpenAI-compatible endpoint) and
|
||||
// llama.cpp both translate response_format=json_object to a decoding-
|
||||
// time JSON constraint, which is the only reliable way to keep small
|
||||
// models from emitting prose where structured output is required.
|
||||
// Previously this field was silently dropped on the OpenAI path,
|
||||
// which is why the SLM classifier saw a 100% prose-failure rate even
|
||||
// after Move 1 wired ResponseFormat at the gnoma layer.
|
||||
if req.ResponseFormat != nil {
|
||||
switch req.ResponseFormat.Type {
|
||||
case provider.ResponseJSON:
|
||||
params.ResponseFormat = oai.ChatCompletionNewParamsResponseFormatUnion{
|
||||
OfJSONObject: &shared.ResponseFormatJSONObjectParam{},
|
||||
}
|
||||
case provider.ResponseText:
|
||||
params.ResponseFormat = oai.ChatCompletionNewParamsResponseFormatUnion{
|
||||
OfText: &shared.ResponseFormatTextParam{},
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if len(params.Tools) > 0 {
|
||||
choice := "auto"
|
||||
if req.ToolChoice != "" {
|
||||
|
||||
@@ -189,3 +189,47 @@ func TestTranslateRequest_ToolChoiceDefault(t *testing.T) {
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func TestTranslateRequest_ResponseFormatJSON(t *testing.T) {
|
||||
req := provider.Request{
|
||||
Model: "qwen2.5-coder:1.5b",
|
||||
Messages: []message.Message{
|
||||
{Role: message.RoleUser, Content: []message.Content{{Type: message.ContentText, Text: "hi"}}},
|
||||
},
|
||||
ResponseFormat: &provider.ResponseFormat{Type: provider.ResponseJSON},
|
||||
}
|
||||
params := translateRequest(req)
|
||||
if params.ResponseFormat.OfJSONObject == nil {
|
||||
t.Errorf("expected OfJSONObject set when ResponseFormat=ResponseJSON, got %+v", params.ResponseFormat)
|
||||
}
|
||||
if params.ResponseFormat.OfText != nil {
|
||||
t.Errorf("expected OfText nil when ResponseFormat=ResponseJSON")
|
||||
}
|
||||
}
|
||||
|
||||
func TestTranslateRequest_ResponseFormatText(t *testing.T) {
|
||||
req := provider.Request{
|
||||
Model: "qwen2.5-coder:1.5b",
|
||||
Messages: []message.Message{
|
||||
{Role: message.RoleUser, Content: []message.Content{{Type: message.ContentText, Text: "hi"}}},
|
||||
},
|
||||
ResponseFormat: &provider.ResponseFormat{Type: provider.ResponseText},
|
||||
}
|
||||
params := translateRequest(req)
|
||||
if params.ResponseFormat.OfText == nil {
|
||||
t.Errorf("expected OfText set when ResponseFormat=ResponseText, got %+v", params.ResponseFormat)
|
||||
}
|
||||
}
|
||||
|
||||
func TestTranslateRequest_ResponseFormatUnset(t *testing.T) {
|
||||
req := provider.Request{
|
||||
Model: "qwen2.5-coder:1.5b",
|
||||
Messages: []message.Message{
|
||||
{Role: message.RoleUser, Content: []message.Content{{Type: message.ContentText, Text: "hi"}}},
|
||||
},
|
||||
}
|
||||
params := translateRequest(req)
|
||||
if params.ResponseFormat.OfJSONObject != nil || params.ResponseFormat.OfText != nil {
|
||||
t.Errorf("expected zero-valued ResponseFormat when not set, got %+v", params.ResponseFormat)
|
||||
}
|
||||
}
|
||||
|
||||
@@ -93,16 +93,27 @@ func DiscoverOllama(ctx context.Context, baseURL string, probeCache map[string]O
|
||||
Size: m.Size,
|
||||
}
|
||||
|
||||
// Always probe; the cache is optional. Previously nil-cache was
|
||||
// treated as "skip probing entirely", which left SupportsTools
|
||||
// at its zero value (false) for every model — every ollama-
|
||||
// discovered arm then got marked as tool-unsupported and
|
||||
// rejected by filterFeasible for any tool-requiring task. main.go
|
||||
// passes nil from the synchronous discovery path; we still want
|
||||
// real probe data there.
|
||||
var result OllamaProbeResult
|
||||
if probeCache != nil {
|
||||
result, ok := probeCache[m.Name]
|
||||
if !ok {
|
||||
if cached, ok := probeCache[m.Name]; ok {
|
||||
result = cached
|
||||
} else {
|
||||
result = probeOllamaModel(ctx, baseURL, m.Name)
|
||||
probeCache[m.Name] = result
|
||||
}
|
||||
dm.SupportsTools = result.SupportsTools
|
||||
dm.SupportsVision = result.SupportsVision
|
||||
dm.ContextSize = result.ContextSize
|
||||
} else {
|
||||
result = probeOllamaModel(ctx, baseURL, m.Name)
|
||||
}
|
||||
dm.SupportsTools = result.SupportsTools
|
||||
dm.SupportsVision = result.SupportsVision
|
||||
dm.ContextSize = result.ContextSize
|
||||
|
||||
if dm.ContextSize == 0 {
|
||||
dm.ContextSize = defaultOllamaContextSize
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
package router
|
||||
|
||||
import (
|
||||
"log/slog"
|
||||
"math"
|
||||
)
|
||||
|
||||
@@ -281,20 +282,39 @@ func effectiveCost(arm *Arm, task Task) float64 {
|
||||
// filterFeasible returns arms that can handle the task (tools, pool capacity, quality).
|
||||
// Arms that pass tool and pool checks but fall below the task's minimum quality threshold
|
||||
// are collected separately and used as a last resort if no arm meets the threshold.
|
||||
//
|
||||
// When the result is empty the caller surfaces a generic "no feasible arm"
|
||||
// error; rejection reasons are logged here at slog.Debug per-arm so users
|
||||
// debugging "why did the router reject everything?" with --verbose can see
|
||||
// the actual constraint each arm tripped instead of guessing.
|
||||
func filterFeasible(arms []*Arm, task Task) []*Arm {
|
||||
threshold := DefaultThresholds[task.Type]
|
||||
|
||||
var feasible []*Arm
|
||||
var belowQuality []*Arm // passed tool+pool but scored below minimum quality
|
||||
|
||||
reject := func(arm *Arm, reason string, fields ...any) {
|
||||
base := []any{
|
||||
"arm", arm.ID,
|
||||
"task", task.Type,
|
||||
"complexity", task.ComplexityScore,
|
||||
"reason", reason,
|
||||
}
|
||||
slog.Debug("filterFeasible: rejected", append(base, fields...)...)
|
||||
}
|
||||
|
||||
for _, arm := range arms {
|
||||
// Complexity ceiling: zero means no ceiling (preserves behavior for all existing arms).
|
||||
if arm.MaxComplexity > 0 && task.ComplexityScore > arm.MaxComplexity {
|
||||
reject(arm, "complexity_exceeds_max",
|
||||
"max_complexity", arm.MaxComplexity)
|
||||
continue
|
||||
}
|
||||
|
||||
// Must support tools if task requires them
|
||||
if task.RequiresTools && !arm.SupportsTools() {
|
||||
reject(arm, "tools_required_but_unsupported",
|
||||
"tool_use_capability", arm.Capabilities.ToolUse)
|
||||
continue
|
||||
}
|
||||
|
||||
@@ -303,11 +323,15 @@ func filterFeasible(arms []*Arm, task Task) []*Arm {
|
||||
// cannot consume the image bytes, so degrading to it would silently
|
||||
// drop the image and confuse the model.
|
||||
if task.RequiresVision && !arm.Capabilities.Vision {
|
||||
reject(arm, "vision_required_but_unsupported",
|
||||
"vision_capability", arm.Capabilities.Vision)
|
||||
continue
|
||||
}
|
||||
|
||||
// Must support the required effort level (EffortAuto always passes)
|
||||
if !arm.Capabilities.SupportsEffort(task.RequiredEffort) {
|
||||
reject(arm, "effort_level_unsupported",
|
||||
"required_effort", task.RequiredEffort)
|
||||
continue
|
||||
}
|
||||
|
||||
@@ -316,6 +340,8 @@ func filterFeasible(arms []*Arm, task Task) []*Arm {
|
||||
for _, pool := range arm.Pools {
|
||||
pool.CheckReset()
|
||||
if !pool.CanAfford(arm.ID, task.EstimatedTokens) {
|
||||
reject(arm, "pool_capacity_exceeded",
|
||||
"estimated_tokens", task.EstimatedTokens)
|
||||
poolsOK = false
|
||||
break
|
||||
}
|
||||
@@ -333,6 +359,16 @@ func filterFeasible(arms []*Arm, task Task) []*Arm {
|
||||
feasible = append(feasible, arm)
|
||||
}
|
||||
|
||||
if len(feasible) == 0 && len(belowQuality) == 0 {
|
||||
slog.Debug("filterFeasible: no arms feasible at any quality level",
|
||||
"task", task.Type,
|
||||
"complexity", task.ComplexityScore,
|
||||
"requires_tools", task.RequiresTools,
|
||||
"requires_vision", task.RequiresVision,
|
||||
"arms_considered", len(arms),
|
||||
)
|
||||
}
|
||||
|
||||
// Degrade gracefully: if no arm meets quality threshold, use below-quality ones
|
||||
if len(feasible) == 0 && len(belowQuality) > 0 {
|
||||
return belowQuality
|
||||
|
||||
@@ -14,10 +14,13 @@ import (
|
||||
"somegit.dev/Owlibou/gnoma/internal/stream"
|
||||
)
|
||||
|
||||
// defaultClassifyTimeout — 5 s accommodates thinking-mode models like
|
||||
// Qwen3 distillations (Tiny3.5) that emit reasoning tokens before output.
|
||||
// Non-thinking models complete in well under 1 s.
|
||||
const defaultClassifyTimeout = 5 * time.Second
|
||||
// defaultClassifyTimeout — 15 s accommodates cold-start model loads
|
||||
// (ollama lazily loads on first call, ~2-8s for a 1.5B model on SSD)
|
||||
// combined with thinking-mode first-token latency (Qwen3 distillations
|
||||
// like Tiny3.5 sometimes emit <think> tokens before the JSON output
|
||||
// even with /no_think). Non-thinking warm models complete in well
|
||||
// under 1 s. Tune via [slm].classify_timeout in config.
|
||||
const defaultClassifyTimeout = 15 * time.Second
|
||||
|
||||
const classifySystemPrompt = `Classify the following coding request. /no_think
|
||||
Respond with JSON only, no other text, no reasoning, no thinking tags.
|
||||
@@ -47,14 +50,18 @@ type Classifier struct {
|
||||
|
||||
// NewClassifier creates a Classifier. model is the model name passed to the provider
|
||||
// (llamafile ignores it but openaicompat requires a non-empty value).
|
||||
func NewClassifier(p provider.Provider, model string, logger *slog.Logger) *Classifier {
|
||||
// Pass timeout=0 to use the built-in default (defaultClassifyTimeout).
|
||||
func NewClassifier(p provider.Provider, model string, timeout time.Duration, logger *slog.Logger) *Classifier {
|
||||
if logger == nil {
|
||||
logger = slog.Default()
|
||||
}
|
||||
if timeout <= 0 {
|
||||
timeout = defaultClassifyTimeout
|
||||
}
|
||||
return &Classifier{
|
||||
provider: p,
|
||||
model: model,
|
||||
timeout: defaultClassifyTimeout,
|
||||
timeout: timeout,
|
||||
logger: logger,
|
||||
}
|
||||
}
|
||||
@@ -68,7 +75,11 @@ func (c *Classifier) Classify(ctx context.Context, prompt string, history []mess
|
||||
|
||||
resp, err := c.callSLM(tctx, prompt)
|
||||
if err != nil {
|
||||
c.logger.Debug("slm classify fallback", "error", err)
|
||||
// Warn-level so a first-time misconfiguration (timeout too tight,
|
||||
// wrong endpoint, malformed JSON from the model) surfaces without
|
||||
// requiring --verbose. The fallback path itself is benign; the
|
||||
// signal is that the SLM isn't doing the work it was supposed to.
|
||||
c.logger.Warn("slm classify fallback", "error", err, "timeout", c.timeout)
|
||||
t, ferr := router.HeuristicClassifier{}.Classify(ctx, prompt, history)
|
||||
t.ClassifierSource = router.ClassifierSLMFallback
|
||||
return t, ferr
|
||||
@@ -91,9 +102,25 @@ func (c *Classifier) Classify(ctx context.Context, prompt string, history []mess
|
||||
}
|
||||
|
||||
func (c *Classifier) callSLM(ctx context.Context, prompt string) (*classifyResponse, error) {
|
||||
// Constrain the model toward valid, deterministic JSON output. Without
|
||||
// these settings small models routinely ignore the JSON-only system
|
||||
// prompt, emit reasoning blocks (<think>, <Thought Process>) or just
|
||||
// answer the user's prompt in prose. ResponseFormat=json_object asks
|
||||
// the provider to enforce JSON at decoding time where supported
|
||||
// (ollama 'format=json', llama.cpp grammar, OpenAI json_object). Even
|
||||
// when the provider can't enforce, the explicit signal nudges the
|
||||
// adapter to set the right backend flag.
|
||||
temp := 0.0
|
||||
topP := 1.0
|
||||
req := provider.Request{
|
||||
Model: c.model,
|
||||
SystemPrompt: classifySystemPrompt,
|
||||
Temperature: &temp,
|
||||
TopP: &topP,
|
||||
MaxTokens: 128, // classification output is ~50 tokens; cap to prevent runaway reasoning
|
||||
ResponseFormat: &provider.ResponseFormat{
|
||||
Type: provider.ResponseJSON,
|
||||
},
|
||||
Messages: []message.Message{
|
||||
{
|
||||
Role: message.RoleUser,
|
||||
@@ -127,10 +154,22 @@ func (c *Classifier) callSLM(ctx context.Context, prompt string) (*classifyRespo
|
||||
return &resp, nil
|
||||
}
|
||||
|
||||
// extractJSON pulls the first {...} substring from s, stripping markdown fences if present.
|
||||
// extractJSON pulls the first {...} substring from s, stripping markdown
|
||||
// fences and known thinking-block tags. Small models routinely violate
|
||||
// the JSON-only system prompt by emitting reasoning tokens first, so
|
||||
// the extractor must tolerate prefixes the model wasn't asked to emit.
|
||||
func extractJSON(s string) string {
|
||||
s = strings.TrimSpace(s)
|
||||
|
||||
// Strip known thinking-block tags. Order matters: longer/more-
|
||||
// specific names first so a partial match doesn't shadow a real
|
||||
// one. Seen in the wild on Qwen3 (<think>) and tiny3.5
|
||||
// (<Thought Process>); the others are defensive against similar
|
||||
// fine-tunes.
|
||||
for _, tag := range []string{"Thought Process", "thinking", "reasoning", "thoughts", "think"} {
|
||||
s = stripTagBlock(s, tag)
|
||||
}
|
||||
|
||||
// Strip ```json ... ``` fences.
|
||||
if strings.HasPrefix(s, "```") {
|
||||
end := strings.LastIndex(s, "```")
|
||||
@@ -160,3 +199,28 @@ func extractJSON(s string) string {
|
||||
}
|
||||
return s[start:]
|
||||
}
|
||||
|
||||
// stripTagBlock removes <tag>...</tag> blocks (case-insensitive on the
|
||||
// tag name) from the start of s. Returns the original string if the tag
|
||||
// is not at the start. Idempotent; safe to call repeatedly.
|
||||
func stripTagBlock(s, tag string) string {
|
||||
trimmed := strings.TrimSpace(s)
|
||||
open := "<" + tag
|
||||
lower := strings.ToLower(trimmed)
|
||||
if !strings.HasPrefix(lower, strings.ToLower(open)) {
|
||||
return s
|
||||
}
|
||||
// Find the matching closing tag, case-insensitive.
|
||||
close := "</" + tag + ">"
|
||||
closeIdx := strings.Index(strings.ToLower(trimmed), strings.ToLower(close))
|
||||
if closeIdx < 0 {
|
||||
// Unterminated thinking block — strip up to the first '{'
|
||||
// so we still have a shot at extracting JSON that follows.
|
||||
braceIdx := strings.IndexByte(trimmed, '{')
|
||||
if braceIdx > 0 {
|
||||
return strings.TrimSpace(trimmed[braceIdx:])
|
||||
}
|
||||
return s
|
||||
}
|
||||
return strings.TrimSpace(trimmed[closeIdx+len(close):])
|
||||
}
|
||||
|
||||
@@ -54,7 +54,7 @@ func TestClassifier_HappyPath(t *testing.T) {
|
||||
// SLM complexity 0.55 stays above the Debug floor (0.4), so the SLM
|
||||
// value is preserved verbatim.
|
||||
p := &mockProvider{text: `{"task_type":"Debug","complexity":0.55,"requires_tools":false}`}
|
||||
cls := NewClassifier(p, "default", nil)
|
||||
cls := NewClassifier(p, "default", 0, nil)
|
||||
|
||||
task, err := cls.Classify(context.Background(), "fix the failing test", nil)
|
||||
if err != nil {
|
||||
@@ -76,7 +76,7 @@ func TestClassifier_AppliesTaskTypeFloor(t *testing.T) {
|
||||
// bump ComplexityScore up to the floor so the SLM arm can't be picked
|
||||
// for its own kind of misclassification.
|
||||
p := &mockProvider{text: `{"task_type":"Debug","complexity":0.25,"requires_tools":false}`}
|
||||
cls := NewClassifier(p, "default", nil)
|
||||
cls := NewClassifier(p, "default", 0, nil)
|
||||
|
||||
task, err := cls.Classify(context.Background(), "fix the failing test", nil)
|
||||
if err != nil {
|
||||
@@ -91,7 +91,7 @@ func TestClassifier_AppliesTaskTypeFloor(t *testing.T) {
|
||||
func TestClassifier_BlendHeuristic(t *testing.T) {
|
||||
// SLM returns one type; other Task fields should come from heuristic.
|
||||
p := &mockProvider{text: `{"task_type":"Boilerplate","complexity":0.1,"requires_tools":false}`}
|
||||
cls := NewClassifier(p, "default", nil)
|
||||
cls := NewClassifier(p, "default", 0, nil)
|
||||
|
||||
task, err := cls.Classify(context.Background(), "scaffold a new HTTP handler", nil)
|
||||
if err != nil {
|
||||
@@ -108,7 +108,7 @@ func TestClassifier_BlendHeuristic(t *testing.T) {
|
||||
|
||||
func TestClassifier_FallbackOnBadJSON(t *testing.T) {
|
||||
p := &mockProvider{text: "I cannot classify that."}
|
||||
cls := NewClassifier(p, "default", nil)
|
||||
cls := NewClassifier(p, "default", 0, nil)
|
||||
|
||||
// Should not error — falls back to heuristic.
|
||||
task, err := cls.Classify(context.Background(), "write unit tests for the parser", nil)
|
||||
@@ -123,7 +123,7 @@ func TestClassifier_FallbackOnBadJSON(t *testing.T) {
|
||||
|
||||
func TestClassifier_FallbackOnProviderError(t *testing.T) {
|
||||
p := &mockProvider{err: errors.New("connection refused")}
|
||||
cls := NewClassifier(p, "default", nil)
|
||||
cls := NewClassifier(p, "default", 0, nil)
|
||||
|
||||
task, err := cls.Classify(context.Background(), "explain how generics work", nil)
|
||||
if err != nil {
|
||||
@@ -137,7 +137,7 @@ func TestClassifier_FallbackOnProviderError(t *testing.T) {
|
||||
|
||||
func TestClassifier_FallbackOnTimeout(t *testing.T) {
|
||||
p := &mockProvider{delay: 500 * time.Millisecond}
|
||||
cls := NewClassifier(p, "default", nil)
|
||||
cls := NewClassifier(p, "default", 0, nil)
|
||||
cls.timeout = 50 * time.Millisecond // force timeout
|
||||
|
||||
task, err := cls.Classify(context.Background(), "debug the failing test", nil)
|
||||
@@ -153,7 +153,7 @@ func TestClassifier_FallbackOnTimeout(t *testing.T) {
|
||||
func TestClassifier_FenceStripping(t *testing.T) {
|
||||
fenced := "```json\n{\"task_type\":\"Refactor\",\"complexity\":0.5,\"requires_tools\":true}\n```"
|
||||
p := &mockProvider{text: fenced}
|
||||
cls := NewClassifier(p, "default", nil)
|
||||
cls := NewClassifier(p, "default", 0, nil)
|
||||
|
||||
task, err := cls.Classify(context.Background(), "refactor the auth middleware", nil)
|
||||
if err != nil {
|
||||
@@ -166,7 +166,7 @@ func TestClassifier_FenceStripping(t *testing.T) {
|
||||
|
||||
func TestClassifier_UnknownTaskType_FallsBackToHeuristic(t *testing.T) {
|
||||
p := &mockProvider{text: `{"task_type":"FooBar","complexity":0.3,"requires_tools":false}`}
|
||||
cls := NewClassifier(p, "default", nil)
|
||||
cls := NewClassifier(p, "default", 0, nil)
|
||||
|
||||
task, err := cls.Classify(context.Background(), "implement a binary search function", nil)
|
||||
if err != nil {
|
||||
@@ -178,7 +178,7 @@ func TestClassifier_UnknownTaskType_FallsBackToHeuristic(t *testing.T) {
|
||||
|
||||
func TestClassifier_SetsClassifierSource_OnSuccess(t *testing.T) {
|
||||
p := &mockProvider{text: `{"task_type":"Debug","complexity":0.3,"requires_tools":true}`}
|
||||
cls := NewClassifier(p, "default", nil)
|
||||
cls := NewClassifier(p, "default", 0, nil)
|
||||
task, err := cls.Classify(context.Background(), "fix the failing test", nil)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
@@ -190,7 +190,7 @@ func TestClassifier_SetsClassifierSource_OnSuccess(t *testing.T) {
|
||||
|
||||
func TestClassifier_SetsClassifierSource_OnFallback(t *testing.T) {
|
||||
p := &mockProvider{err: errors.New("backend unreachable")}
|
||||
cls := NewClassifier(p, "default", nil)
|
||||
cls := NewClassifier(p, "default", 0, nil)
|
||||
task, err := cls.Classify(context.Background(), "fix the failing test", nil)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
@@ -202,7 +202,7 @@ func TestClassifier_SetsClassifierSource_OnFallback(t *testing.T) {
|
||||
|
||||
func TestClassifier_ContextPassedToHistory(t *testing.T) {
|
||||
p := &mockProvider{text: `{"task_type":"Explain","complexity":0.2,"requires_tools":false}`}
|
||||
cls := NewClassifier(p, "default", nil)
|
||||
cls := NewClassifier(p, "default", 0, nil)
|
||||
|
||||
history := []message.Message{
|
||||
{Role: message.RoleUser, Content: []message.Content{{Type: message.ContentText, Text: "prior"}}},
|
||||
@@ -215,3 +215,45 @@ func TestClassifier_ContextPassedToHistory(t *testing.T) {
|
||||
t.Errorf("Type = %s, want Explain", task.Type)
|
||||
}
|
||||
}
|
||||
|
||||
func TestExtractJSON_StripsThinkingTags(t *testing.T) {
|
||||
cases := []struct {
|
||||
name string
|
||||
in string
|
||||
want string
|
||||
}{
|
||||
{
|
||||
name: "qwen-think-block",
|
||||
in: `<think>Let me decide</think>{"task_type":"Debug","complexity":0.5,"requires_tools":true}`,
|
||||
want: `{"task_type":"Debug","complexity":0.5,"requires_tools":true}`,
|
||||
},
|
||||
{
|
||||
name: "tiny3.5-thought-process",
|
||||
in: "<Thought Process>\nUser wants debugging help.\n</Thought Process>\n{\"task_type\":\"Debug\",\"complexity\":0.4,\"requires_tools\":true}",
|
||||
want: `{"task_type":"Debug","complexity":0.4,"requires_tools":true}`,
|
||||
},
|
||||
{
|
||||
name: "unterminated-think-falls-back-to-brace",
|
||||
in: `<think>incomplete reasoning {"task_type":"Explain","complexity":0.2,"requires_tools":false}`,
|
||||
want: `{"task_type":"Explain","complexity":0.2,"requires_tools":false}`,
|
||||
},
|
||||
{
|
||||
name: "no-tags-still-works",
|
||||
in: `{"task_type":"Generation","complexity":0.6,"requires_tools":false}`,
|
||||
want: `{"task_type":"Generation","complexity":0.6,"requires_tools":false}`,
|
||||
},
|
||||
{
|
||||
name: "fenced-json-still-works",
|
||||
in: "```json\n{\"task_type\":\"Refactor\",\"complexity\":0.5,\"requires_tools\":true}\n```",
|
||||
want: `{"task_type":"Refactor","complexity":0.5,"requires_tools":true}`,
|
||||
},
|
||||
}
|
||||
for _, tc := range cases {
|
||||
t.Run(tc.name, func(t *testing.T) {
|
||||
got := extractJSON(tc.in)
|
||||
if got != tc.want {
|
||||
t.Errorf("extractJSON(...)\n got: %q\n want: %q", got, tc.want)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1146,6 +1146,15 @@ func (m Model) submitInput(input string) (tea.Model, tea.Cmd) {
|
||||
m.thinkingBuf.Reset()
|
||||
m.streamFilterClose = ""
|
||||
|
||||
// Recover from a prior StateError before submitting a fresh user
|
||||
// prompt. A transient routing or engine failure used to leave the
|
||||
// session in error state, blocking every subsequent prompt with
|
||||
// "session not idle (state: error)" until the user restarted gnoma.
|
||||
// User-initiated sends always carry an intent-to-retry, so resetting
|
||||
// here is the safe default; the /init retry path has its own explicit
|
||||
// ResetError that we leave alone.
|
||||
m.session.ResetError()
|
||||
|
||||
if err := m.session.Send(expandedInput); err != nil {
|
||||
m.messages = append(m.messages, chatMessage{role: "error", content: formatError(err)})
|
||||
m.streaming = false
|
||||
@@ -1494,6 +1503,8 @@ func (m Model) handleCommand(cmd string) (tea.Model, tea.Cmd) {
|
||||
m.initWriteNudged = false
|
||||
|
||||
opts := engine.TurnOptions{}
|
||||
// Recover from prior StateError before /init can submit.
|
||||
m.session.ResetError()
|
||||
if err := m.session.SendWithOptions(prompt, opts); err != nil {
|
||||
m.messages = append(m.messages, chatMessage{role: "error", content: formatError(err)})
|
||||
m.streaming = false
|
||||
@@ -1695,6 +1706,8 @@ func (m Model) handleCommand(cmd string) (tea.Model, tea.Cmd) {
|
||||
AllowedTools: sk.Frontmatter.AllowedTools,
|
||||
AllowedPaths: sk.Frontmatter.Paths,
|
||||
}
|
||||
// Recover from prior StateError before the skill submits.
|
||||
m.session.ResetError()
|
||||
if err := m.session.SendWithOptions(rendered, skillOpts); err != nil {
|
||||
m.messages = append(m.messages, chatMessage{role: "error", content: formatError(err)})
|
||||
m.streaming = false
|
||||
|
||||
Reference in New Issue
Block a user