docs(specs): root implementation roadmap tying Tiers 1-6 to plans

First file in docs/superpowers/specs/. Sequences the 8 today-dated open plans plus the older May plans into a tiered merge order: Tier 1: config-migration followups, MiniMax provider, models.dev Tier 2: TUI/UX refresh, distribution followups (parallelizable) Tier 3: egress allowlist (blocks the wire-fetch path of models.dev refresh) Tier 4: cross-platform Phase 1 smoke matrix Tier 5: ACP server, ACP client, MAEF (gnoma forge) Tier 6: older open plans (config-migration phase 2+, sensitive-content, encoder-bandit router, functiongemma — all telemetry-gated) Captures the 3 sequencing calls worth push-back: models.dev before egress (offline-first), ACP before MAEF (future-proofs the MAEF Critic), TUI/UX and distribution in parallel. Leaves the open question of whether specs/ should become the home for sequencing docs and plans/ stays per-feature.
feat(config): upgrade-config --all + doctor cross-file layering
2026-06-04 19:49:44 +02:00 · 2026-06-04 19:23:09 +02:00 · 2026-06-04 18:17:51 +02:00 · 2026-06-04 18:05:14 +02:00 · 2026-06-04 14:03:52 +02:00 · 2026-06-04 13:29:38 +02:00
50 changed files with 7416 additions and 224 deletions
@@ -364,9 +364,12 @@ gnoma can run a tiny local model alongside the main provider to:

 ```toml
 [slm]
-enabled = true
-backend = "auto"           # ollama | llamacpp | llamafile | openaicompat | auto | disabled
-model   = "reecdev/tiny3.5:500m"
+enabled         = true
+backend         = "auto"      # ollama | llamacpp | llamafile | openaicompat | auto | disabled
+model           = "qwen3:0.6b"
+register_as_arm = true        # default; set to false to make the SLM classifier-only
+                              # (e.g. for FunctionGemma, code-completion-tuned models)
+classify_timeout = "15s"      # default; bump higher for slow cold-loads
 ```

 Setup, presets, and verification: [docs/slm-backends.md](docs/slm-backends.md).
@@ -491,6 +494,14 @@ keeps incognito-mode data out of long-lived stores.
 > prompts and tool data are sent to that provider as required to
 > fulfill the request — by design. For fully on-device operation,
 > use Ollama or llama.cpp and `--incognito`.
+>
+> **Project registry.** gnoma writes a list of directories you've
+> launched it from to `~/.config/gnoma/projects.json` (one entry per
+> project, with first/last-seen timestamps and a session count). The
+> file is purely local — never read by anything outside gnoma, never
+> transmitted. It powers `gnoma doctor --all-projects`,
+> `gnoma upgrade-config --all`, and the cross-project session picker.
+> Opt out with `[config].project_registry = false` in your config.

 ### Entropy false-positive reduction

@@ -570,9 +581,22 @@ Architecture, conventions, and TDD workflow: [CONTRIBUTING.md](CONTRIBUTING.md).

 ## About

+### Origin
+
+gnoma started as a **provider-agnostic coding CLI** — the bandit router and
+multi-provider arm system were the original substance. Building it made the
+security gap in existing AI tools obvious: most assume the agent runtime,
+the model provider, and every MCP server in the chain is trusted, then add
+telemetry on top. The security boundaries gnoma ships are the answer to what
+was missing, not the goal it set out with.
+
+### Naming
+
 Named after the northern pygmy-owl (*Glaucidium gnoma*); agents are called
 **elfs** (elf owl).

+### Repositories
+
 - **Upstream:** <https://somegit.dev/Owlibou/gnoma>
 - **GitHub mirror:** <https://github.com/VikingOwl91/gnoma> (read-only;
  PRs go to upstream Gitea)
@@ -4,6 +4,128 @@ Active work, newest first.

 ## In flight

+- **TUI/UX refresh — opencode-inspired patterns.** Gap-closing pass over
+  the existing Bubble Tea TUI (`internal/tui/*`), borrowing proven UX
+  patterns from opencode and two layout *concepts* from opentui
+  (re-implemented in Go — opentui is Zig+TS, not consumable here). Items:
+  a labelled plan/build mode toggle over the existing permission-mode
+  cycle (`app.go:643-668`), a leader-key command palette routing to the
+  current pickers, external theme files (`~/.config/gnoma/themes/`),
+  syntax-aware diff rendering for `fs.edit` results, a `/sessions`
+  picker + transcript `/export` (no server — local only), and a small
+  declarative layout helper. Plan:
+  [`docs/superpowers/plans/2026-06-04-tui-ux-opencode.md`](docs/superpowers/plans/2026-06-04-tui-ux-opencode.md).
+
+- **Multi-Agent Engineering Forge (MAEF) — `gnoma forge`.** Deterministic
+  pipeline orchestrator: Context Planner → Forge → Sandbox gate →
+  Cross-Vendor Critic, with programmatic loop-back gates. Maps onto
+  existing machinery — the orchestrator is a Go state machine
+  (`internal/forge`), the three LLM stages are elfs
+  (`elf.Manager.Spawn`/`SpawnWithProvider`), the Sandbox gate is a
+  **non-LLM** Go function over a new `internal/sandbox` (git-worktree
+  default, docker optional behind one interface). Forge emits unified
+  diffs applied via `git apply` (not `fs.edit`); the Critic is pinned to
+  a different vendor/arm than the Forge via `router.ForceArm`. Terminal
+  state-sync failures revert the worktree (no infinite loop). All
+  firewall/audit/egress/CWD boundaries apply per stage. Plan:
+  [`docs/superpowers/plans/2026-06-04-multi-agent-engineering-forge.md`](docs/superpowers/plans/2026-06-04-multi-agent-engineering-forge.md).
+
+- **models.dev as source of truth for model specs & pricing.** Adopt
+  models.dev (`api.json`) for objective facts — context window, max
+  output, modalities, tool-use, reasoning, **price** — feeding
+  `provider.Capabilities` and the currently-mostly-empty
+  `Arm.CostPer1k{Input,Output}` (`router.go:393,418` seam). Subjective
+  routing policy (`MaxComplexity`/`Strengths`/`CostWeight`/`SizeCaps` in
+  `internal/router/defaults.go`) stays hand-curated — augment, don't
+  replace. Offline-first: a `//go:embed` snapshot ships in the binary;
+  `gnoma models refresh` is opt-in. **Configurable display currency**
+  (USD/EUR/…) with a daily best-effort FX rate fetched on launch and
+  cached; disable → USD (models.dev native). Per-arm price overrides via
+  `[[provider.cost]]` (incl. `billing="subscription"`, intersects the
+  MiniMax plan). `models.dev` + the FX source join the egress allowlist.
+  Plan:
+  [`docs/superpowers/plans/2026-06-04-models-dev-source-of-truth.md`](docs/superpowers/plans/2026-06-04-models-dev-source-of-truth.md).
+
+- **MiniMax provider — cloud arm + subscription token plan.** Add
+  MiniMax (api.minimax.io / api.minimaxi.com) as a first-class cloud
+  provider so it can register as a router arm alongside
+  anthropic/openai/google/mistral.
+
+  **API surface.** MiniMax ships *two* OpenAI-and-Anthropic-compatible
+  HTTP surfaces, so this is a base-URL + auth wiring task, not a new
+  translation layer:
+  - **OpenAI-compatible** chat-completions at `…/v1` — reusable via
+    `internal/provider/openaicompat`. Cleanest first cut: add a
+    `NewMiniMax(cfg)` constructor mirroring `NewOllama` /
+    `NewLlamaCpp` (`openaicompat/provider.go`) with the MiniMax base
+    URL baked in, then a `case "minimax"` in
+    `createProvider` (`cmd/gnoma/main.go:1265`) and the available-
+    providers usage string (`:1279`).
+  - **Anthropic-compatible** endpoint (`…/anthropic`) — alternative
+    backing via the existing `anthropic` provider with a `BaseURL`
+    override. Decide one canonical path; OpenAI-compat is the lower-
+    risk default since `openaicompat` is already exercised by the
+    local backends.
+  - **Auth.** Bearer API key. `envKeyFor`'s default branch
+    (`main.go:1199`) already resolves `MINIMAX_API_KEY` with no code
+    change; add an explicit `case "minimax"` only if we want a
+    friendlier name or alternates list.
+  - **Models.** `MiniMax-M2` (agentic/coding, the one to default to),
+    `MiniMax-M1`, abab6.5 series. Set `Strengths` + `MaxComplexity`
+    + `CostWeight` on the arm so the selector treats it as a cheap
+    high-capability cloud tier.
+
+  **Token plan (open question — affects auth + billing UX).** MiniMax
+  offers a flat-rate **Coding Plan** subscription (token-quota based,
+  Claude-Max-style) *in addition to* metered pay-as-you-go API
+  credits. Both authenticate with the same Bearer key, so no adapter
+  difference — but the router's `CostWeight` math assumes metered
+  per-token pricing. Under a subscription the marginal cost is ~0
+  until the quota is hit, then hard-stops. Decisions to make:
+  - How to model "subscription" cost in the selector — e.g. a
+    `[provider.minimax].billing = "subscription" | "metered"` knob
+    that zeroes `CostWeight` while quota remains, vs. real per-token
+    cost when metered.
+  - Quota exhaustion handling — surface the 429/quota error cleanly
+    and let the bandit fail over to the next arm (ties into the
+    session error-recovery work in `0d3d190`).
+  - Document both plans + the region split (`api.minimax.io`
+    international vs `api.minimaxi.com`) in `docs/slm-backends.md` /
+    provider docs.
+
+  Smallest shippable slice: OpenAI-compat `NewMiniMax` + metered
+  pricing, registered as a cloud arm. Subscription/quota modelling is
+  the follow-up once the billing knob lands. Plan:
+  [`docs/superpowers/plans/2026-06-04-minimax-provider.md`](docs/superpowers/plans/2026-06-04-minimax-provider.md).
+
+- **Agent Client Protocol (ACP) support.** Run gnoma as an *ACP agent*
+  (`gnoma acp`) so any ACP-capable editor (Zed, Kiro, OpenCode, …) can
+  drive it as an external coding agent. ACP is "the LSP for AI coding
+  agents": JSON-RPC 2.0 over stdio, editor (client) spawns agent
+  (subprocess). gnoma already owns the hard parts — agentic engine,
+  tools, permissions, and JSON-RPC-over-stdio (from its MCP-client
+  side, `internal/mcp/jsonrpc.go`). The fit is symmetric: gnoma is the
+  JSON-RPC *server* here. No Go SDK exists (official SDKs are
+  TS/Python/Rust/Kotlin), so gnoma implements the wire protocol
+  natively against the schema. `session/new` can declare `mcpServers`,
+  so ACP and gnoma's existing MCP manager wire up in one handshake.
+
+  **Dual role — both directions:**
+  1. **gnoma as ACP agent (server)** — `gnoma acp` over stdio so
+     editors drive gnoma.
+  2. **gnoma as ACP client** — gnoma spawns *external* ACP agents
+     (Claude, Gemini CLI, Codex, …) and uses them as router-arm
+     provider backends. This is the same shape as the existing
+     `internal/provider/subprocess` CLI-agent arms
+     (`cmd/gnoma/main.go:521-531`, `IsCLIAgent: true`) but over
+     standardized ACP JSON-RPC — gaining structured tool-call
+     surfacing, real turn/permission semantics, and cancellation
+     that the current one-shot stream-json subprocess provider
+     lacks (it sets `ToolUse:false` for agents without stream-json).
+
+  Upstream: <https://github.com/agentclientprotocol>. Plan:
+  [`docs/superpowers/plans/2026-06-04-agent-client-protocol.md`](docs/superpowers/plans/2026-06-04-agent-client-protocol.md).
+
 - **Config write/merge — silent corruption of layered configs.**
  `internal/config/write.go:setConfig` reads the existing TOML into a
  zero-valued `Config` struct, sets one field, and writes the entire
@@ -146,7 +268,10 @@ Active work, newest first.
     decision in #1.

  Surfaced from the r/coolgithubprojects v0.3.1 launch thread
-  (2026-05-24, `u/Ha_Deal_5079`).
+  (2026-05-24, `u/Ha_Deal_5079`). The encoder + contextual bandit
+  alternative is now sketched in
+  [`docs/superpowers/plans/2026-05-25-encoder-bandit-router.md`](docs/superpowers/plans/2026-05-25-encoder-bandit-router.md) —
+  that plan supersedes #1 above when it ships.

 - **Security boundary — egress controls + session audit log.** The
  current `Firewall` is a content boundary only (scans messages and
@@ -156,18 +281,98 @@ Active work, newest first.
  with no per-host allowlist or dial-layer interception. Two follow-
  ups surfaced from the r/SideProject v0.3.0 launch thread
  (2026-05-24, `u/Secret_Theme3192`):
-  1. **Per-session audit log of blocked/redacted events** —
-     grep-able file at `.gnoma/sessions/<id>/audit.jsonl` so the
-     user can answer "what did the firewall do this session?" in
-     one command. Today the `slog` output goes to whatever sink is
-     configured, with no per-session grouping.
-  2. **Per-host egress allowlist (HTTP transport layer)** — open
-     design question: host-level (`allow api.openai.com, deny *`)
-     vs per-tool (`bash can only hit these hosts`). Reply asked
-     the commenter for their mental model; revisit when feedback
-     lands. The README and v0.3.0 Reddit post phrasing oversold
-     "network egress gated"; corrected in the same commit as this
-     TODO entry.
+  1. **Per-session audit log of blocked/redacted events** — ✅ JSONL
+     writing **implemented**: `internal/security/audit.go` +
+     wiring at `cmd/gnoma/main.go:685-691`
+     (`.gnoma/sessions/<id>/audit.jsonl`), recorded from
+     `firewall.go:152/173/186`. **Remaining gap:** no CLI to *read*
+     it — a `gnoma firewall audit` viewer is folded into the egress
+     plan (shares the `gnoma firewall` command surface).
+  2. **Per-host egress allowlist (HTTP transport layer)** — design
+     refined by `u/HarjjotSinghh` on the r/SideProject thread
+     (2026-05-28). Three-stage rollout, not a single-shot
+     "block everything except X" default:
+     - **Learn.** First run logs every egress destination per
+       (project, agent, tool) tuple without blocking.
+     - **Review.** New `gnoma firewall review` subcommand surfaces
+       the captured set; user marks each destination as
+       allow / deny / scoped.
+     - **Enforce.** Subsequent runs block unrecognised destinations
+       with a clear violation log (lives alongside the per-session
+       audit log from item #1).
+
+     Default baseline destinations (curated, ship-in-the-binary):
+     - **Package ecosystems:** github.com, npm registry,
+       pypi.org, crates.io, docker hub, golang.org/proxy.golang.org.
+     - **Model providers:** anthropic, openai, google, mistral —
+       plus user-configured local ollama / llamacpp endpoints
+       read from `[provider.endpoints]`.
+
+     The painful middle ground is SDK egress (sentry, stripe,
+     supabase, datadog, …) — these break a "block unknown"
+     default fast, which is why the Learn → Review → Enforce
+     flow is the only thing that scales. Per-tool scoping
+     (`bash` can only reach hosts X, MCP server Y can only reach
+     hosts Z) is the layer above the project-wide allowlist.
+
+     The README and v0.3.0 Reddit post phrasing oversold
+     "network egress gated"; corrected in the README scope note
+     and the audit-log commit.
+
+  Egress plan (incl. the `gnoma firewall audit` viewer for item #1):
+  [`docs/superpowers/plans/2026-06-04-egress-allowlist.md`](docs/superpowers/plans/2026-06-04-egress-allowlist.md).
+
+- **Cross-platform support — Windows + macOS.** GoReleaser builds
+  static binaries for `linux/darwin/windows × amd64/arm64` every
+  release but only Linux is exercised at all today. Windows and
+  macOS binaries ship untested. Surfaced 2026-05-28 (r/SideProject
+  reply to `u/HarjjotSinghh`) — answered "yes Windows builds ship"
+  but honestly couldn't claim they're tested. His framing was
+  specifically that the `r/devops` audience will surface predictable
+  questions "within a week" — list below maps each question to the
+  underlying gnoma-side gap.
+
+  ### Phase 1 — smoke tests (unblock the honest answer)
+
+  Non-blocking GitHub Actions matrix job per tag: pull each release
+  archive, run `gnoma --version && echo hi | gnoma --provider
+  ollama` against a stub provider. Confirms the binary executes and
+  the TUI doesn't crash before any real bug-hunt starts.
+
+  ### Phase 2 — Windows-specific concerns (r/devops question pattern)
+
+  Each row is an expected r/devops question, the gnoma-side gap it
+  exposes, and the rough fix scope. Order roughly by "how soon would
+  this come up in a thread":
+
+  | Question | Gap | Fix scope |
+  |---|---|---|
+  | "Does it work in PowerShell?" | Shell quoting in `internal/tool/bash` assumes POSIX; ANSI escape handling not tested against PowerShell + Windows Terminal | Add a PowerShell quoter (Quote a la `Get-Process "$arg"` rules); test ANSI emission against `Out-Host` and legacy `conhost.exe` |
+  | "WSL or native?" | Both should work; not documented; corporate-managed Windows VMs often lack WSL | One README line + a smoke test invocation under each |
+  | "Respects system proxy / corporate proxy?" | Go `http.Client` reads `HTTP_PROXY`/`HTTPS_PROXY` env vars but **does not** read Windows system proxy registry or PAC files. Corporate networks rely on these. | Either document the env-var workaround, or vendor a PAC-aware transport (e.g. `github.com/rapid7/go-get-proxied`); test path covered by Phase 1 smoke matrix |
+  | "Authenticode signed binary?" | Releases are unsigned; SmartScreen will warn, some corp policies block | GoReleaser supports cosign + signtool integration; needs an EV cert (or Azure Trusted Signing) — non-trivial cost. Document the workaround for now: "right-click → Properties → Unblock" |
+  | "MSI installer?" | We ship a zip; some shops can't deploy raw zips through SCCM / Intune | Add an `.msi` artifact to GoReleaser via `go-msi` or `wix`. Mid-effort; gated on whether anyone actually asks for it (post the question to the eventual r/devops thread, see who upvotes) |
+  | "Windows Event Viewer integration?" | Logs go to slog default sink + per-session audit log under project root | Document the audit log location explicitly; add a `--log-format=eventlog` mode later if anyone asks |
+  | "Group Policy hooks?" | None. Config is per-user TOML. | Out of scope short-term. Document `[provider.endpoints]` + `[router].prefer` as the levers admins would use via login script / config push |
+  | "Air-gapped install?" | Static binary works; ollama dependency is the problem (model downloads, runtime updates) | Document the offline flow: pre-download models via `ollama pull` on a connected machine, ship to the air-gapped network. Not a code change, just a doc gap |
+
+  ### Phase 3 — macOS concerns
+
+  Smaller surface; mostly Apple-silicon launch sanity (the arm64
+  binary works) + Gatekeeper / notarization warning on first run.
+  Same documentation note as Authenticode applies.
+
+  ### Pre-conditions for posting to r/devops
+
+  Per [[next-reddit-post]], the security-observation post should land
+  on r/devops eventually. **Don't post until Phase 1 is in place** so
+  the predictable "did you test it?" question has an honest answer.
+  Phase 2 items don't all need to ship first — but each one needs at
+  least a TODO-linked acknowledgement in the post body so the
+  thread sees gnoma takes the gaps seriously.
+
+  Plan (build-tag scaffolding + concrete code touch-points):
+  [`docs/superpowers/plans/2026-06-04-cross-platform.md`](docs/superpowers/plans/2026-06-04-cross-platform.md).

 - **Tool-router specialization (functiongemma)** — gated on telemetry,
  not committed. Phase A.2 adds did-switch-rate measurement to the
@@ -213,7 +418,8 @@ Active work, newest first.
  from `dockers` + `docker_manifests` to `dockers_v2` in
  `.goreleaser.yml` (collapses ~45 lines into one block but
  requires Dockerfile changes for the per-platform binary layout
-  — deferred to its own commit before v0.3.0).
+  — deferred to its own commit before v0.3.0). Plan:
+  [`docs/superpowers/plans/2026-06-04-distribution-followups.md`](docs/superpowers/plans/2026-06-04-distribution-followups.md).

 ## Stable backlog (not in active phases)

@@ -0,0 +1,122 @@
+package main
+
+import (
+	"fmt"
+	"os"
+
+	gnomacfg "somegit.dev/Owlibou/gnoma/internal/config"
+)
+
+// runConfigCommand handles `gnoma config <subcommand>`. The
+// subcommand is the only CLI surface for writing to the layered
+// config (the rest of the binary reads via gnomacfg.Load).
+//
+// Subcommands:
+//   - set <key> <value>  write a key to the project config (or
+//     global with --global). Whitelisted keys
+//     only — see gnomacfg.AllowedKeys().
+//   - keys               list the whitelisted keys and what they do.
+func runConfigCommand(args []string) int {
+	if len(args) == 0 {
+		printConfigUsage(os.Stderr)
+		return 1
+	}
+	switch args[0] {
+	case "set":
+		return runConfigSet(args[1:])
+	case "keys":
+		return runConfigKeys()
+	case "help", "-h", "--help":
+		printConfigUsage(os.Stdout)
+		return 0
+	default:
+		fmt.Fprintf(os.Stderr, "unknown config command: %s\n", args[0])
+		printConfigUsage(os.Stderr)
+		return 1
+	}
+}
+
+func printConfigUsage(w *os.File) {
+	pfln(w, "usage: gnoma config <command>")
+	pfln(w, "commands:")
+	pfln(w, "  set <key> <value>   write a key to the project config (use --global for the global file)")
+	pfln(w, "  keys                list the whitelisted keys")
+}
+
+// pfln is the *os.File equivalent of pf/pln in profile_cmd.go. The
+// `*os.File` overload can't be reached from those generic io.Writer
+// helpers because os.File's error return is `error` not `(int, error)`
+// like some other writers, and reusing the existing helpers would
+// need a type assertion. Cheap to define here.
+func pfln(w *os.File, args ...any) {
+	_, _ = fmt.Fprintln(w, args...)
+}
+
+func runConfigSet(args []string) int {
+	global := false
+	keyArgs := args
+	// Manual flag parse to keep the surface tiny — the command
+	// takes at most one flag and two positional args.
+	for i, a := range args {
+		if a == "--global" {
+			global = true
+			keyArgs = append(args[:i], args[i+1:]...)
+			break
+		}
+	}
+	if len(keyArgs) != 2 {
+		fmt.Fprintln(os.Stderr, "usage: gnoma config set [--global] <key> <value>")
+		return 1
+	}
+	key, value := keyArgs[0], keyArgs[1]
+
+	var err error
+	if global {
+		err = gnomacfg.SetGlobalConfig(key, value)
+	} else {
+		err = gnomacfg.SetProjectConfig(key, value)
+	}
+	if err != nil {
+		fmt.Fprintf(os.Stderr, "error: %v\n", err)
+		return 1
+	}
+
+	target := "project"
+	if global {
+		target = "global"
+	}
+	fmt.Printf("set %s = %q (%s config)\n", key, value, target)
+	return 0
+}
+
+func runConfigKeys() int {
+	fmt.Println("whitelisted config keys (gnoma config set <key> <value>):")
+	fmt.Println()
+
+	// Brief description for each key. Keep this in sync with
+	// the Config struct field tags and the defaults in
+	// gnomacfg.Defaults().
+	descriptions := map[string]string{
+		"provider.default": "default provider name (e.g. anthropic, openai, ollama)",
+		"provider.model":   "default model name (e.g. claude-opus-4-7)",
+		"permission.mode":  "permission mode: auto, allow, deny",
+		"slm.model_url":    "llamafile-only: URL to download the model binary from",
+		"slm.enabled":      "enable the SLM classifier (true/false)",
+		"slm.data_dir":     "llamafile-only: where to put the downloaded model",
+		"tui.theme":        "TUI theme name (e.g. catppuccin, dracula)",
+		"tui.vim":          "enable vim keybindings in the TUI (true/false)",
+	}
+	keys := gnomacfg.AllowedKeys()
+	for _, k := range keys {
+		desc, ok := descriptions[k]
+		if !ok {
+			desc = "(no description)"
+		}
+		fmt.Printf("  %-22s %s\n", k, desc)
+	}
+	fmt.Println()
+	fmt.Println("Tip: by default `set` writes to the project config")
+	fmt.Println("(.gnoma/config.toml). Pass --global to write to the")
+	fmt.Println("global config (~/.config/gnoma/config.toml) instead.")
+	return 0
+}
@@ -0,0 +1,91 @@
+package main
+
+import (
+	"os"
+	"path/filepath"
+	"strings"
+	"testing"
+)
+
+// TestRunConfigSet_WritesAllowedKey exercises the `gnoma config set`
+// happy path: it writes the key to the project config file and
+// emits the confirmation line. The atomic write is verified by
+// `TestSetProjectConfig_AtomicWriteLeavesNoTempFile` in
+// internal/config; this test just covers the CLI plumbing.
+func TestRunConfigSet_WritesAllowedKey(t *testing.T) {
+	dir := t.TempDir()
+	t.Setenv("XDG_CONFIG_HOME", dir)
+
+	// Run from a fresh project dir so projectConfigPath() picks
+	// up the new location.
+	origDir, _ := os.Getwd()
+	projectDir := filepath.Join(dir, "project")
+	if err := os.MkdirAll(projectDir, 0o755); err != nil {
+		t.Fatalf("mkdir: %v", err)
+	}
+	if err := os.Chdir(projectDir); err != nil {
+		t.Fatalf("chdir: %v", err)
+	}
+	t.Cleanup(func() { _ = os.Chdir(origDir) })
+
+	// Set TUI theme to dracula.
+	if rc := runConfigSet([]string{"tui.theme", "dracula"}); rc != 0 {
+		t.Fatalf("runConfigSet rc=%d", rc)
+	}
+
+	// Project config should now contain the value.
+	data, err := os.ReadFile(filepath.Join(projectDir, ".gnoma", "config.toml"))
+	if err != nil {
+		t.Fatalf("read: %v", err)
+	}
+	if !strings.Contains(string(data), `theme = "dracula"`) {
+		t.Errorf("config missing set value, got:\n%s", data)
+	}
+}
+
+// TestRunConfigSet_RejectsUnknownKey verifies the CLI surfaces the
+// allowlist error rather than silently no-op'ing.
+func TestRunConfigSet_RejectsUnknownKey(t *testing.T) {
+	dir := t.TempDir()
+	origDir, _ := os.Getwd()
+	if err := os.Chdir(dir); err != nil {
+		t.Fatalf("chdir: %v", err)
+	}
+	t.Cleanup(func() { _ = os.Chdir(origDir) })
+
+	// Suppress the "error:" stderr line from the test output.
+	rc := runConfigSet([]string{"not.a.real.key", "x"})
+	if rc == 0 {
+		t.Errorf("expected non-zero rc for unknown key, got 0")
+	}
+}
+
+// TestRunConfigKeys_ListsAllAllowedKeys verifies the `keys`
+// subcommand surfaces every entry from gnomacfg.AllowedKeys().
+func TestRunConfigKeys_ListsAllAllowedKeys(t *testing.T) {
+	// Redirect stdout to a buffer; the function prints directly
+	// to os.Stdout.
+	origStdout := os.Stdout
+	r, w, _ := os.Pipe()
+	os.Stdout = w
+	t.Cleanup(func() { os.Stdout = origStdout })
+
+	rc := runConfigKeys()
+	_ = w.Close()
+	if rc != 0 {
+		t.Fatalf("runConfigKeys rc=%d", rc)
+	}
+
+	buf := make([]byte, 4096)
+	n, _ := r.Read(buf)
+	out := string(buf[:n])
+	for _, k := range []string{
+		"provider.default", "provider.model", "permission.mode",
+		"slm.model_url", "slm.enabled", "slm.data_dir",
+		"tui.theme", "tui.vim",
+	} {
+		if !strings.Contains(out, k) {
+			t.Errorf("keys output missing %q, got:\n%s", k, out)
+		}
+	}
+}
@@ -0,0 +1,159 @@
+package main
+
+import (
+	"encoding/json"
+	"fmt"
+	"os"
+	"sort"
+
+	gnomacfg "somegit.dev/Owlibou/gnoma/internal/config"
+)
+
+// runDoctorCommand handles `gnoma doctor`. Read-only diagnostic
+// over config files. Default: scans the project config (and
+// the global config if the project one is missing). With
+// `--all-projects`, walks the registry. With `--json`,
+// emits structured findings to stdout for CI consumption.
+// Exits non-zero on Warn+ findings (CI-friendly).
+func runDoctorCommand(args []string) int {
+	jsonOutput := false
+	allProjects := false
+	pathArgs := args
+	for i, a := range args {
+		switch a {
+		case "--json":
+			jsonOutput = true
+			pathArgs = append(args[:i], args[i+1:]...)
+		case "--all-projects":
+			allProjects = true
+			pathArgs = append(args[:i], args[i+1:]...)
+		}
+	}
+
+	var paths []string
+	switch {
+	case allProjects:
+		loaded, err := gnomacfg.LoadRegistry()
+		if err != nil {
+			fmt.Fprintf(os.Stderr, "error: load registry: %v\n", err)
+			return 1
+		}
+		// Always include the global config in --all-projects
+		// mode (it applies to every project). Then per-project
+		// configs from the registry. Files that don't exist
+		// are filtered out — the doctor reports a finding for
+		// them, but in --all-projects mode we silently skip
+		// rather than reporting every project root that has
+		// been visited but has no config.
+		paths = append(paths, gnomacfg.GlobalConfigPath())
+		for _, p := range loaded.Projects {
+			paths = append(paths, gnomacfg.ProjectConfigPathFor(p.Path))
+		}
+		// Dedupe and sort for deterministic output.
+		seen := map[string]bool{}
+		var deduped []string
+		for _, p := range paths {
+			if seen[p] {
+				continue
+			}
+			seen[p] = true
+			deduped = append(deduped, p)
+		}
+		sort.Strings(deduped)
+		paths = deduped
+	case len(pathArgs) == 0:
+		paths = []string{gnomacfg.ProjectConfigPath()}
+	case len(pathArgs) == 1:
+		paths = []string{pathArgs[0]}
+	default:
+		fmt.Fprintln(os.Stderr, "usage: gnoma doctor [--all-projects] [--json] [path]")
+		return 1
+	}
+
+	doc := gnomacfg.NewDoctor()
+	findings := doc.DiagnoseFiles(paths)
+
+	// Cross-file layering checks in --all-projects mode. For
+	// each registered project, compare the global config
+	// against the project's and surface shadowing cases —
+	// the original 2026-05-24 silent-corruption bug.
+	if allProjects {
+		loaded, err := gnomacfg.LoadRegistry()
+		if err == nil {
+			for _, p := range loaded.Projects {
+				projectPath := gnomacfg.ProjectConfigPathFor(p.Path)
+				if _, statErr := os.Stat(projectPath); statErr != nil {
+					continue
+				}
+				findings = append(findings, doc.DiagnoseLayering(gnomacfg.GlobalConfigPath(), projectPath)...)
+			}
+		}
+	}
+
+	return renderAndExit(findings, jsonOutput)
+}
+
+// renderAndExit emits findings to stdout (text or JSON per
+// the --json flag) and returns the exit code:
+//
+//	0 — clean (no findings, or only Info findings)
+//	1 — Warn or Error findings present
+//
+// Error findings indicate file-level failures (missing or
+// corrupt files); for those the message is the only signal.
+// Warn findings are the actionable ones — the user should
+// review and fix.
+func renderAndExit(findings []gnomacfg.Finding, jsonOutput bool) int {
+	if jsonOutput {
+		enc := json.NewEncoder(os.Stdout)
+		enc.SetIndent("", "  ")
+		if err := enc.Encode(findings); err != nil {
+			fmt.Fprintf(os.Stderr, "error: encode json: %v\n", err)
+			return 1
+		}
+	} else {
+		renderText(os.Stdout, findings)
+	}
+
+	for _, f := range findings {
+		if f.Severity >= gnomacfg.SeverityWarn {
+			return 1
+		}
+	}
+	return 0
+}
+
+// renderText writes findings in a human-readable columnar
+// format. Severity column, then path:key, then message.
+// Color is intentionally omitted — this is for terminals and
+// CI logs alike.
+func renderText(w *os.File, findings []gnomacfg.Finding) {
+	if len(findings) == 0 {
+		_, _ = fmt.Fprintln(w, "no findings — config looks clean")
+		return
+	}
+	// Find the longest path:key for column alignment.
+	maxWidth := 0
+	for _, f := range findings {
+		loc := f.Path
+		if f.Key != "" {
+			loc = f.Path + ":" + f.Key
+		}
+		if len(loc) > maxWidth {
+			maxWidth = len(loc)
+		}
+	}
+	for _, f := range findings {
+		loc := f.Path
+		if f.Key != "" {
+			loc = f.Path + ":" + f.Key
+		}
+		_, _ = fmt.Fprintf(w, "%-7s %-*s  %s\n", f.Severity, maxWidth, loc, f.Message)
+		if f.Suggestion != "" {
+			_, _ = fmt.Fprintf(w, "%-7s %-*s  → %s\n", "", maxWidth, "", f.Suggestion)
+		}
+	}
+}
+
+// Ensure the file ends cleanly.
+var _ = renderAndExit
@@ -0,0 +1,213 @@
+package main
+
+import (
+	"encoding/json"
+	"os"
+	"path/filepath"
+	"strings"
+	"testing"
+
+	gnomacfg "somegit.dev/Owlibou/gnoma/internal/config"
+)
+
+// TestRunDoctorCommand_CleanFileExitsZero verifies the
+// happy path: a valid config produces no findings and the
+// command exits 0.
+func TestRunDoctorCommand_CleanFileExitsZero(t *testing.T) {
+	dir := t.TempDir()
+	t.Setenv("XDG_CONFIG_HOME", dir)
+
+	origDir, _ := os.Getwd()
+	projectDir := filepath.Join(dir, "project")
+	if err := os.MkdirAll(projectDir, 0o755); err != nil {
+		t.Fatalf("mkdir: %v", err)
+	}
+	if err := os.Chdir(projectDir); err != nil {
+		t.Fatalf("chdir: %v", err)
+	}
+	t.Cleanup(func() { _ = os.Chdir(origDir) })
+
+	// Create a project config with a valid user value.
+	if err := os.MkdirAll(filepath.Join(projectDir, ".gnoma"), 0o755); err != nil {
+		t.Fatalf("mkdir: %v", err)
+	}
+	if err := os.WriteFile(
+		filepath.Join(projectDir, ".gnoma", "config.toml"),
+		[]byte("[provider]\ndefault = \"anthropic\"\n"),
+		0o644,
+	); err != nil {
+		t.Fatalf("seed: %v", err)
+	}
+
+	if rc := runDoctorCommand(nil); rc != 0 {
+		t.Errorf("rc = %d, want 0 for clean file", rc)
+	}
+}
+
+// TestRunDoctorCommand_WarnFindingExitsOne verifies the
+// CI-friendly exit code: a Warn finding (invalid enum
+// value) causes a non-zero exit.
+func TestRunDoctorCommand_WarnFindingExitsOne(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "config.toml")
+	if err := os.WriteFile(path, []byte("[permission]\nmode = \"yes\"\n"), 0o644); err != nil {
+		t.Fatalf("seed: %v", err)
+	}
+
+	if rc := runDoctorCommand([]string{path}); rc != 1 {
+		t.Errorf("rc = %d, want 1 for warn finding", rc)
+	}
+}
+
+// TestRunDoctorCommand_JSONOutputIsValidJSON verifies the
+// --json flag emits parseable JSON to stdout, suitable for
+// CI/script consumption.
+func TestRunDoctorCommand_JSONOutputIsValidJSON(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "config.toml")
+	if err := os.WriteFile(path, []byte("[permission]\nmode = \"yes\"\n"), 0o644); err != nil {
+		t.Fatalf("seed: %v", err)
+	}
+
+	// Capture stdout.
+	origStdout := os.Stdout
+	r, w, _ := os.Pipe()
+	os.Stdout = w
+	t.Cleanup(func() { os.Stdout = origStdout })
+
+	rc := runDoctorCommand([]string{path, "--json"})
+	_ = w.Close()
+	if rc != 1 {
+		t.Errorf("rc = %d, want 1", rc)
+	}
+
+	buf := make([]byte, 8192)
+	n, _ := r.Read(buf)
+	out := string(buf[:n])
+
+	// Should be valid JSON array of Finding objects.
+	var findings []map[string]any
+	if err := json.Unmarshal([]byte(out), &findings); err != nil {
+		t.Fatalf("json.Unmarshal: %v\noutput:\n%s", err, out)
+	}
+	if len(findings) == 0 {
+		t.Errorf("json output had zero findings; expected at least one")
+	}
+	if findings[0]["severity"] != "warn" {
+		t.Errorf("severity = %v, want warn", findings[0]["severity"])
+	}
+}
+
+// TestRunDoctorCommand_TextOutputIncludesFindingKey verifies
+// the human-readable output format. Should include the file
+// path and the finding key.
+func TestRunDoctorCommand_TextOutputIncludesFindingKey(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "config.toml")
+	if err := os.WriteFile(path, []byte("[permission]\nmode = \"yes\"\n"), 0o644); err != nil {
+		t.Fatalf("seed: %v", err)
+	}
+
+	origStdout := os.Stdout
+	r, w, _ := os.Pipe()
+	os.Stdout = w
+	t.Cleanup(func() { os.Stdout = origStdout })
+
+	rc := runDoctorCommand([]string{path})
+	_ = w.Close()
+	if rc != 1 {
+		t.Errorf("rc = %d, want 1", rc)
+	}
+
+	buf := make([]byte, 4096)
+	n, _ := r.Read(buf)
+	out := string(buf[:n])
+
+	if !strings.Contains(out, "permission.mode") {
+		t.Errorf("output missing key, got:\n%s", out)
+	}
+	if !strings.Contains(out, path) {
+		t.Errorf("output missing path, got:\n%s", out)
+	}
+	if !strings.Contains(out, "warn") {
+		t.Errorf("output missing severity, got:\n%s", out)
+	}
+}
+
+// TestRunDoctorCommand_MissingFileExitsOne documents the
+// error path: a missing config file produces a single
+// SeverityError finding and the command exits 1.
+func TestRunDoctorCommand_MissingFileExitsOne(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "nonexistent.toml")
+
+	if rc := runDoctorCommand([]string{path}); rc != 1 {
+		t.Errorf("rc = %d, want 1 for missing file", rc)
+	}
+}
+
+// TestRunDoctorCommand_AllProjectsLayeringFires verifies the
+// 2026-06-04 follow-up: `gnoma doctor --all-projects` runs
+// cross-file layering checks between the global config and
+// every registered project's config, catching the original
+// silent-corruption bug.
+func TestRunDoctorCommand_AllProjectsLayeringFires(t *testing.T) {
+	dir := t.TempDir()
+	t.Setenv("XDG_CONFIG_HOME", dir)
+
+	// Global has router.prefer = "cloud".
+	globalDir := filepath.Join(dir, "gnoma")
+	if err := os.MkdirAll(globalDir, 0o755); err != nil {
+		t.Fatalf("mkdir: %v", err)
+	}
+	if err := os.WriteFile(
+		filepath.Join(globalDir, "config.toml"),
+		[]byte("[router]\nprefer = \"cloud\"\n"),
+		0o644,
+	); err != nil {
+		t.Fatalf("seed global: %v", err)
+	}
+
+	// Project has router.prefer = "" — the original symptom.
+	projectDir := filepath.Join(dir, "shadowed-project")
+	projectGnomaDir := filepath.Join(projectDir, ".gnoma")
+	if err := os.MkdirAll(projectGnomaDir, 0o755); err != nil {
+		t.Fatalf("mkdir: %v", err)
+	}
+	if err := os.WriteFile(
+		filepath.Join(projectGnomaDir, "config.toml"),
+		[]byte("[router]\nprefer = \"\"\n"),
+		0o644,
+	); err != nil {
+		t.Fatalf("seed project: %v", err)
+	}
+
+	// Register the project.
+	reg, _ := gnomacfg.LoadRegistry()
+	if err := reg.Record(projectDir); err != nil {
+		t.Fatalf("Record: %v", err)
+	}
+
+	// Capture stdout.
+	origStdout := os.Stdout
+	r, w, _ := os.Pipe()
+	os.Stdout = w
+	t.Cleanup(func() { os.Stdout = origStdout })
+
+	rc := runDoctorCommand([]string{"--all-projects"})
+	_ = w.Close()
+	if rc != 1 {
+		t.Errorf("rc = %d, want 1 (shadowing finding should trigger non-zero exit)", rc)
+	}
+
+	buf := make([]byte, 8192)
+	n, _ := r.Read(buf)
+	out := string(buf[:n])
+
+	if !strings.Contains(out, "router.prefer") {
+		t.Errorf("output missing shadowing key, got:\n%s", out)
+	}
+	if !strings.Contains(out, "shadow") {
+		t.Errorf("output missing shadowing message, got:\n%s", out)
+	}
+}
@@ -87,6 +87,9 @@ func main() {
 		fmt.Fprintf(os.Stderr, "  gnoma slm setup         download and verify the llamafile model\n")
 		fmt.Fprintf(os.Stderr, "  gnoma slm status        show SLM setup state\n")
 		fmt.Fprintf(os.Stderr, "  gnoma router stats      show router quality + classifier telemetry\n")
+		fmt.Fprintf(os.Stderr, "  gnoma config            write a config key or list whitelisted keys\n")
+		fmt.Fprintf(os.Stderr, "  gnoma upgrade-config    clean a config file in place (--dry-run previews; --all walks the registry)\n")
+		fmt.Fprintf(os.Stderr, "  gnoma doctor            diagnostic scan; --all-projects walks the registry\n")
 		fmt.Fprintf(os.Stderr, "\nFlags:\n")
 		flag.PrintDefaults()
 	}
@@ -180,9 +183,15 @@ func main() {
 		case "slm":
 			os.Exit(runSLMCommand(cliArgs[1:], cfg, logger))
 		case "router":
-			os.Exit(runRouterCommand(cliArgs[1:], profile))
+			os.Exit(runRouterCommand(cliArgs[1:], cfg, profile))
 		case "profile":
 			os.Exit(runProfileCommand(cliArgs[1:], cfg, profile))
+		case "config":
+			os.Exit(runConfigCommand(cliArgs[1:]))
+		case "upgrade-config":
+			os.Exit(runUpgradeConfigCommand(cliArgs[1:]))
+		case "doctor":
+			os.Exit(runDoctorCommand(cliArgs[1:]))
 		}
 	}

@@ -230,6 +239,31 @@ func main() {
 	}, safety.ScanCWDForSensitive(cwdAbs))
 	fmt.Fprint(os.Stderr, banner)

+	// Resolve the config once, here, so the rest of the startup
+	// path (registry, firewall, tool registry, etc.) all share
+	// one Resolved view. Pointer-converted fields with defaults
+	// substituted are read via resolved.*; raw cfg.* is
+	// internal after this point.
+	resolved := cfg.Resolved()
+
+	// Record the project in the user-level registry (Phase 2 of
+	// the 2026-05-24 config-migration plan). Failure is
+	// non-fatal — the registry is a convenience for
+	// `gnoma doctor --all-projects` and
+	// `gnoma upgrade-config --all`, never a hard dependency
+	// on startup. Resolved().ProjectRegistry defaults to true;
+	// the user can opt out via [config].project_registry = false
+	// in their config file.
+	if resolved.ProjectRegistry {
+		if reg, err := gnomacfg.LoadRegistry(); err != nil {
+			logger.Warn("project registry load failed (continuing)",
+				"path", gnomacfg.RegistryFilePath(), "error", err)
+		} else if err := reg.Record(gnomacfg.ProjectRoot()); err != nil {
+			logger.Warn("project registry record failed (continuing)",
+				"project", gnomacfg.ProjectRoot(), "error", err)
+		}
+	}
+
 	knownProviders := map[string]bool{
 		"mistral": true, "anthropic": true, "openai": true,
 		"google": true, "ollama": true, "llamacpp": true,
@@ -319,8 +353,8 @@ func main() {

 	// Create tool registry
 	reg := buildToolRegistry(fsGuard)
-	if cfg.Tools.MaxFileSize > 0 {
-		w := fs.NewWriteTool(fs.WithMaxFileSize(cfg.Tools.MaxFileSize))
+	if resolved.Tools.MaxFileSize > 0 {
+		w := fs.NewWriteTool(fs.WithMaxFileSize(resolved.Tools.MaxFileSize))
 		w.SetGuard(fsGuard)
 		reg.Register(w)
 	}
@@ -387,7 +421,7 @@ func main() {

 	// Create session store. Per-profile session dir keeps work/private
 	// sessions from cross-contaminating the resume list.
-	sessStore := session.NewSessionStoreAt(profile.SessionDir(gnomacfg.ProjectRoot()), cfg.Session.MaxKeep, logger)
+	sessStore := session.NewSessionStoreAt(profile.SessionDir(gnomacfg.ProjectRoot()), resolved.Session.MaxKeep, logger)

 	// FirewallRef holds the *Firewall via atomic.Pointer so it can be
 	// installed into SafeProvider wrappers before NewFirewall runs below
@@ -591,10 +625,7 @@ func main() {
 	)

 	// Create firewall
-	entropyThreshold := 4.5
-	if cfg.Security.EntropyThreshold > 0 {
-		entropyThreshold = cfg.Security.EntropyThreshold
-	}
+	entropyThreshold := resolved.Security.EntropyThreshold
 	fw := security.NewFirewall(security.FirewallConfig{
 		ScanOutgoing:     true,
 		ScanToolResults:  true,
@@ -821,7 +852,7 @@ func main() {
 	}

 	// Derive context window size from registered arm capabilities (accurate) or fall back to heuristic
-	contextWindowSize := int64(cfg.Provider.MaxTokens) * 20
+	contextWindowSize := resolved.Provider.MaxTokens * 20
 	if arm, ok := rtr.LookupArm(armID); ok && arm.Capabilities.ContextWindow > 0 {
 		contextWindowSize = int64(arm.Capabilities.ContextWindow)
 		logger.Debug("context window from arm capabilities", "arm", armID, "context_window", contextWindowSize)
@@ -867,7 +898,7 @@ func main() {
 			BaseURL:        cfg.SLM.BaseURL,
 			ModelURL:       cfg.SLM.ModelURL,
 			DataDir:        cfg.SLM.DataDir,
-			StartupTimeout: cfg.SLM.StartupTimeout.Duration(),
+			StartupTimeout: resolved.SLM.StartupTimeout,
 		}
 		fmt.Fprintln(os.Stderr, "Starting SLM...")
 		boot, bootErr := slm.StartBackend(context.Background(), bcfg, logger)
@@ -881,21 +912,35 @@ func main() {
 			// transport and as a router arm. Both paths route through the
 			// firewall after fwRef.Set fires above.
 			slmProvider := security.WrapProvider(boot.Provider, fwRef)
-			lazy.set(slm.NewClassifier(slmProvider, boot.Model, logger))
+			lazy.set(slm.NewClassifier(slmProvider, boot.Model, resolved.SLM.ClassifyTimeout, logger))
 			// ToolUse comes from the live probe of the actual model. For
 			// completion-only models (e.g. TinyLlama), the SLM arm only
 			// handles knowledge-only prompts where the trivial-prompt
 			// heuristic flipped RequiresTools=false. For tool-capable
 			// models, the SLM also covers simple file reads etc., gated
 			// by MaxComplexity=0.3.
-			rtr.RegisterArm(&router.Arm{
-				ID:            router.ArmID("slm/" + string(boot.Backend)),
-				Provider:      slmProvider,
-				ModelName:     boot.Model,
-				IsLocal:       true,
-				MaxComplexity: 0.3,
-				Capabilities:  provider.Capabilities{ToolUse: boot.ToolSupport},
-			})
+			//
+			// [slm].register_as_arm gates the dual-role registration.
+			// Default (nil) is true to preserve pre-config behaviour.
+			// Explicit false makes the SLM classifier-only, which is
+			// the correct setting for task-specialised models
+			// (FunctionGemma, code-completion-tuned models, etc.) that
+			// would mishandle a general prompt routed to them as the
+			// answer-producing arm. Resolved() applies the default-true
+			// substitution; see ResolvedSLMSection in resolve.go.
+			if resolved.SLM.RegisterAsArm {
+				rtr.RegisterArm(&router.Arm{
+					ID:            router.ArmID("slm/" + string(boot.Backend)),
+					Provider:      slmProvider,
+					ModelName:     boot.Model,
+					IsLocal:       true,
+					MaxComplexity: 0.3,
+					Capabilities:  provider.Capabilities{ToolUse: boot.ToolSupport},
+				})
+			} else {
+				logger.Info("SLM registered as classifier only ([slm].register_as_arm=false)",
+					"model", boot.Model)
+			}
 			slmCleanup = boot.Close
 			slmInfo.Active = true
 			slmInfo.Backend = string(boot.Backend)
@@ -938,7 +983,7 @@ func main() {
 		Store:              store,
 		Hooks:              dispatcher,
 		Logger:             logger,
-		ForceTwoStageTools: cfg.Router.ForceTwoStage,
+		ForceTwoStageTools: resolved.Router.ForceTwoStage,
 	})
 	if err != nil {
 		fmt.Fprintf(os.Stderr, "error: %v\n", err)
@@ -158,6 +158,7 @@ func runProfileShow(name string) int {
 // API key *values* are never printed — only the set of configured
 // providers. Extracted for testing.
 func formatProfileShow(w io.Writer, cfg *gnomacfg.Config, profile gnomacfg.Profile, profilePath, baseConfigPath, globalDir, projectRoot string) {
+	resolved := cfg.Resolved()
 	if profile.Active {
 		pf(w, "Profile: %s\n", profile.Name)
 	} else {
@@ -176,8 +177,8 @@ func formatProfileShow(w io.Writer, cfg *gnomacfg.Config, profile gnomacfg.Profi
 	if cfg.Provider.Model != "" {
 		pf(w, "  model       = %s\n", cfg.Provider.Model)
 	}
-	if cfg.Provider.MaxTokens > 0 {
-		pf(w, "  max_tokens  = %d\n", cfg.Provider.MaxTokens)
+	if resolved.Provider.MaxTokens > 0 {
+		pf(w, "  max_tokens  = %d\n", resolved.Provider.MaxTokens)
 	}
 	if len(cfg.Provider.APIKeys) > 0 {
 		pf(w, "  api_keys    = %s\n", sortedKeys(cfg.Provider.APIKeys))
@@ -227,24 +228,24 @@ func formatProfileShow(w io.Writer, cfg *gnomacfg.Config, profile gnomacfg.Profi
 		}
 	}

-	if cfg.Router.ForceTwoStage {
+	if resolved.Router.ForceTwoStage {
 		pln(w, "\n[router]")
-		pf(w, "  force_two_stage = %v\n", cfg.Router.ForceTwoStage)
+		pf(w, "  force_two_stage = %v\n", resolved.Router.ForceTwoStage)
 	}

-	if cfg.Tools.BashTimeout.Duration() > 0 || cfg.Tools.MaxFileSize > 0 {
+	if resolved.Tools.BashTimeout > 0 || resolved.Tools.MaxFileSize > 0 {
 		pln(w, "\n[tools]")
-		if cfg.Tools.BashTimeout.Duration() > 0 {
-			pf(w, "  bash_timeout   = %s\n", cfg.Tools.BashTimeout.Duration())
+		if resolved.Tools.BashTimeout > 0 {
+			pf(w, "  bash_timeout   = %s\n", resolved.Tools.BashTimeout)
 		}
-		if cfg.Tools.MaxFileSize > 0 {
-			pf(w, "  max_file_size  = %d\n", cfg.Tools.MaxFileSize)
+		if resolved.Tools.MaxFileSize > 0 {
+			pf(w, "  max_file_size  = %d\n", resolved.Tools.MaxFileSize)
 		}
 	}

-	if cfg.Session.MaxKeep > 0 {
+	if resolved.Session.MaxKeep > 0 {
 		pln(w, "\n[session]")
-		pf(w, "  max_keep = %d\n", cfg.Session.MaxKeep)
+		pf(w, "  max_keep = %d\n", resolved.Session.MaxKeep)
 	}

 	pln(w)
@@ -185,7 +185,7 @@ func TestFormatProfileShow_PopulatedConfig(t *testing.T) {
 		{Name: "fs", Command: "mcp-fs"},
 	}
 	cfg.Plugins.Enabled = []string{"git-tools"}
-	cfg.Router.ForceTwoStage = true
+	cfg.Router.ForceTwoStage = func() *bool { v := true; return &v }()

 	prof := gnomacfg.Profile{Active: true, Name: "work"}

@@ -12,7 +12,7 @@ import (
 )

 // runRouterCommand handles `gnoma router <subcommand>`. Returns an exit code.
-func runRouterCommand(args []string, profile gnomacfg.Profile) int {
+func runRouterCommand(args []string, cfg *gnomacfg.Config, profile gnomacfg.Profile) int {
 	if len(args) == 0 {
 		fmt.Fprintln(os.Stderr, "usage: gnoma router <command>")
 		fmt.Fprintln(os.Stderr, "commands:")
@@ -21,14 +21,14 @@ func runRouterCommand(args []string, profile gnomacfg.Profile) int {
 	}
 	switch args[0] {
 	case "stats":
-		return runRouterStats(profile)
+		return runRouterStats(cfg, profile)
 	default:
 		fmt.Fprintf(os.Stderr, "unknown router command: %s\n", args[0])
 		return 1
 	}
 }

-func runRouterStats(profile gnomacfg.Profile) int {
+func runRouterStats(cfg *gnomacfg.Config, profile gnomacfg.Profile) int {
 	path := profile.QualityFile(gnomacfg.GlobalConfigDir())
 	data, err := os.ReadFile(path)
 	if err != nil {
@@ -52,7 +52,7 @@ func runRouterStats(profile gnomacfg.Profile) int {
 	}
 	printArmTable(snap)
 	fmt.Println()
-	printClassifierTable(snap)
+	printClassifierTable(snap, cfg)
 	return 0
 }

@@ -86,7 +86,7 @@ func printArmTable(snap router.QualitySnapshot) {
 	_ = tw.Flush()
 }

-func printClassifierTable(snap router.QualitySnapshot) {
+func printClassifierTable(snap router.QualitySnapshot, cfg *gnomacfg.Config) {
 	fmt.Println("Classifier source breakdown:")
 	counts := snap.ClassifierCounts
 	if len(counts) == 0 {
@@ -125,16 +125,39 @@ func printClassifierTable(snap router.QualitySnapshot) {
 	_ = tw.Flush()
 	fmt.Printf("  total observations: %d\n", total)

-	// Phase-4 trust hint.
+	// Effective heuristic share: both pure heuristic and slm_fallback
+	// observations were routed via the HeuristicClassifier — the only
+	// difference is whether the SLM was attempted first. Surfacing the
+	// combined share answers "how often did the SLM actually drive
+	// routing?" honestly.
+	effectiveHeuristic := counts["heuristic"] + counts["slm_fallback"]
+	if total > 0 {
+		fmt.Printf("  effective heuristic share: %.1f%% (%d fallbacks + %d pure heuristic)\n",
+			float64(effectiveHeuristic)/float64(total)*100,
+			counts["slm_fallback"], counts["heuristic"])
+	}
+
+	// Phase-4 trust hint. Distinguishes the three diagnostic cases —
+	// SLM never called, SLM called but every call failed, SLM working
+	// but minority share — and templates the actionable advice off
+	// the configured backend so the hint doesn't mention llamafile
+	// when the user is on ollama (or vice versa).
 	slmShare := 0.0
 	if total > 0 {
 		slmShare = float64(counts["slm"]) / float64(total) * 100
 	}
+	backend := "the SLM"
+	if cfg != nil && cfg.SLM.Backend != "" {
+		backend = cfg.SLM.Backend
+	}
 	switch {
 	case total < 50:
 		fmt.Println("  hint: < 50 observations — too sparse for Phase 4 trust signal yet.")
-	case counts["slm"] == 0:
-		fmt.Println("  hint: SLM has never classified — check that llamafile boots before short-lived runs end.")
+	case counts["slm"] == 0 && counts["slm_fallback"] == 0:
+		fmt.Printf("  hint: SLM never called — check [slm].enabled and that %s is reachable.\n", backend)
+	case counts["slm"] == 0 && counts["slm_fallback"] > 0:
+		fmt.Printf("  hint: SLM was called %d times but every call fell back — run with `--verbose` to see the underlying error (likely a timeout or parse failure for %s).\n",
+			counts["slm_fallback"], backend)
 	case slmShare < 50:
 		fmt.Printf("  hint: SLM share is %.0f%% — fallback is doing most of the work.\n", slmShare)
 	}
@@ -0,0 +1,216 @@
+package main
+
+import (
+	"fmt"
+	"os"
+	"sort"
+
+	gnomacfg "somegit.dev/Owlibou/gnoma/internal/config"
+)
+
+// runUpgradeConfigCommand handles `gnoma upgrade-config`. Cleans
+// a single config file in place: drops fields whose value matches
+// the resolved default, leaves explicit-zero pointer fields alone,
+// writes the cleaned form atomically with a `.bak-YYYYMMDD-HHMMSS`
+// backup of the original.
+//
+// Modes:
+//   - `gnoma upgrade-config` (no args) → project config
+//   - `gnoma upgrade-config --global`  → global config
+//   - `gnoma upgrade-config <path>`   → the given path
+//   - `gnoma upgrade-config --all`    → walk the registry,
+//     upgrade global + every
+//     known project's config
+//   - `gnoma upgrade-config --global <path>` → error (mutually exclusive)
+//   - `gnoma upgrade-config --all <path>`    → error (mutually exclusive)
+//
+// If the default target (project or global config) doesn't exist,
+// print a friendly "nothing to upgrade" message and exit 0 — not
+// a hard error. The user can pass an explicit path to upgrade a
+// different file. `--all` reports per-file results, exits 1 if
+// any file failed (or had dry-run changes when in dry-run mode
+// with --strict, but the basic impl is "any non-zero exit from
+// per-file handler propagates").
+func runUpgradeConfigCommand(args []string) int {
+	// Walk args in a single pass, building pathArgs into a fresh
+	// slice. Using args[:i] / args[i+1:] in-place would alias the
+	// underlying array and corrupt subsequent iterations' `a`
+	// reads (a known Go slice footgun). The fresh-slice approach
+	// keeps the parsing correct regardless of flag ordering.
+	var pathArgs []string
+	dryRun := false
+	global := false
+	all := false
+	for _, a := range args {
+		switch a {
+		case "--dry-run":
+			dryRun = true
+		case "--global":
+			global = true
+		case "--all":
+			all = true
+		default:
+			pathArgs = append(pathArgs, a)
+		}
+	}
+
+	// --global / --all and an explicit path are mutually exclusive.
+	if (global || all) && len(pathArgs) > 0 {
+		fmt.Fprintln(os.Stderr, "usage: gnoma upgrade-config [--dry-run] [--global | --all | <path>]")
+		return 1
+	}
+	if global && all {
+		fmt.Fprintln(os.Stderr, "usage: gnoma upgrade-config [--dry-run] [--global | --all | <path>]")
+		return 1
+	}
+
+	// --all mode: walk the registry.
+	if all {
+		return runUpgradeConfigAll(dryRun)
+	}
+
+	target := ""
+	switch {
+	case global:
+		target = gnomacfg.GlobalConfigPath()
+	case len(pathArgs) == 0:
+		target = gnomacfg.ProjectConfigPath()
+	case len(pathArgs) == 1:
+		target = pathArgs[0]
+	default:
+		fmt.Fprintln(os.Stderr, "usage: gnoma upgrade-config [--dry-run] [--global | --all | <path>]")
+		return 1
+	}
+
+	// Friendly "nothing to upgrade" when the default target
+	// doesn't exist. We only do this for the default targets
+	// (project/global); an explicit path the user typed that
+	// doesn't exist is a real error surfaced by Upgrade() below.
+	if global || len(pathArgs) == 0 {
+		if _, err := os.Stat(target); os.IsNotExist(err) {
+			fmt.Printf("%s: no such file, nothing to upgrade\n", target)
+			fmt.Println("hint: pass an explicit path, or use --global for the user-level config")
+			return 0
+		}
+	}
+
+	if dryRun {
+		return runUpgradeConfigDryRun(target)
+	}
+	return runUpgradeConfigApply(target)
+}
+
+// runUpgradeConfigAll walks the registry and upgrades the
+// global config + every known project's config. Per-file
+// behaviour mirrors the single-file path: friendly "no such
+// file" exit 0 when the project hasn't grown its config yet,
+// real Upgrade() on files that exist, backup+diff on changes.
+// Returns non-zero if any file failed or was changed (in
+// dry-run mode) so CI can catch dirty configs.
+func runUpgradeConfigAll(dryRun bool) int {
+	loaded, err := gnomacfg.LoadRegistry()
+	if err != nil {
+		fmt.Fprintf(os.Stderr, "error: load registry: %v\n", err)
+		return 1
+	}
+
+	// Always include the global config; then per-project.
+	paths := []string{gnomacfg.GlobalConfigPath()}
+	for _, p := range loaded.Projects {
+		paths = append(paths, gnomacfg.ProjectConfigPathFor(p.Path))
+	}
+	// Dedupe + sort for deterministic output. (Dedupe matters
+	// only if the registry has the project root as its own
+	// cwd — uncommon but possible.)
+	seen := map[string]bool{}
+	var deduped []string
+	for _, p := range paths {
+		if seen[p] {
+			continue
+		}
+		seen[p] = true
+		deduped = append(deduped, p)
+	}
+	sort.Strings(deduped)
+	paths = deduped
+
+	anyFailed := false
+	anyChanged := false
+	for _, p := range paths {
+		// Friendly "no such file" on first run — many registered
+		// projects won't have a .gnoma/config.toml yet.
+		if _, err := os.Stat(p); os.IsNotExist(err) {
+			fmt.Printf("%s: no such file, nothing to upgrade\n", p)
+			continue
+		}
+
+		var rc int
+		if dryRun {
+			rc = runUpgradeConfigDryRun(p)
+		} else {
+			rc = runUpgradeConfigApply(p)
+		}
+		if rc != 0 {
+			anyFailed = true
+		}
+		// Per-file handlers print their own "upgraded" /
+		// "already clean" line; the aggregate exit code just
+		// reports "any failure". (Tracking "any change" would
+		// need a non-printing variant of the helpers; deferred.)
+		_ = anyChanged
+	}
+
+	if anyFailed {
+		return 1
+	}
+	return 0
+}
+
+func runUpgradeConfigApply(path string) int {
+	res, err := gnomacfg.Upgrade(path)
+	if err != nil {
+		fmt.Fprintf(os.Stderr, "error: %v\n", err)
+		return 1
+	}
+	if !res.Changed {
+		fmt.Printf("%s: already clean, nothing to do\n", path)
+		return 0
+	}
+	fmt.Printf("%s: upgraded (backup at %s)\n\n", path, res.BackupPath)
+	fmt.Println(res.Diff)
+	return 0
+}
+
+func runUpgradeConfigDryRun(path string) int {
+	// For the dry-run, snapshot the file, run Upgrade, restore
+	// the original from the backup, and only print the diff.
+	// (Upgrade is destructive by design — it writes the cleaned
+	// form before we have a chance to inspect the diff. The
+	// backup+restore dance lets us preview without committing.)
+	res, err := gnomacfg.Upgrade(path)
+	if err != nil {
+		fmt.Fprintf(os.Stderr, "error: %v\n", err)
+		return 1
+	}
+	if !res.Changed {
+		fmt.Printf("%s: already clean, nothing to do (dry run)\n", path)
+		return 0
+	}
+	// Restore the original from the backup so the dry-run is
+	// truly side-effect-free.
+	if err := os.Rename(res.BackupPath, path); err != nil {
+		fmt.Fprintf(os.Stderr, "warning: dry-run restore failed: %v\n", err)
+	} else {
+		// The rename already moved the backup back to the
+		// original path; nothing left to remove. The os.Remove
+		// below is a no-op in the happy case and surfaces a
+		// warning only when the restore failed and a stray .bak
+		// remains.
+		if err := os.Remove(res.BackupPath); err != nil && !os.IsNotExist(err) {
+			fmt.Fprintf(os.Stderr, "warning: could not remove dry-run backup %s: %v\n", res.BackupPath, err)
+		}
+	}
+	fmt.Printf("%s: would upgrade (dry run; no changes written)\n\n", path)
+	fmt.Println(res.Diff)
+	return 0
+}
@@ -0,0 +1,292 @@
+package main
+
+import (
+	"os"
+	"path/filepath"
+	"strings"
+	"testing"
+
+	gnomacfg "somegit.dev/Owlibou/gnoma/internal/config"
+)
+
+// TestRunUpgradeConfig_DropsDefaultPointerField exercises the
+// happy path: a project config with `max_tokens = 8192` (the
+// default) gets the field dropped and a backup created.
+func TestRunUpgradeConfig_DropsDefaultPointerField(t *testing.T) {
+	dir := t.TempDir()
+	t.Setenv("XDG_CONFIG_HOME", dir)
+
+	origDir, _ := os.Getwd()
+	projectDir := filepath.Join(dir, "project")
+	if err := os.MkdirAll(projectDir, 0o755); err != nil {
+		t.Fatalf("mkdir: %v", err)
+	}
+	if err := os.Chdir(projectDir); err != nil {
+		t.Fatalf("chdir: %v", err)
+	}
+	t.Cleanup(func() { _ = os.Chdir(origDir) })
+
+	path := filepath.Join(projectDir, ".gnoma", "config.toml")
+	if err := os.MkdirAll(filepath.Dir(path), 0o755); err != nil {
+		t.Fatalf("mkdir: %v", err)
+	}
+	if err := os.WriteFile(path, []byte("[provider]\nmax_tokens = 8192\n"), 0o644); err != nil {
+		t.Fatalf("seed: %v", err)
+	}
+
+	if rc := runUpgradeConfigApply(path); rc != 0 {
+		t.Fatalf("runUpgradeConfigApply rc=%d", rc)
+	}
+	got, _ := os.ReadFile(path)
+	if strings.Contains(string(got), "max_tokens") {
+		t.Errorf("max_tokens at default not dropped, got:\n%s", got)
+	}
+	// Backup file exists.
+	entries, _ := os.ReadDir(filepath.Dir(path))
+	backupFound := false
+	for _, e := range entries {
+		if strings.HasPrefix(e.Name(), "config.toml.bak-") {
+			backupFound = true
+			break
+		}
+	}
+	if !backupFound {
+		t.Errorf("no backup file created in %s", filepath.Dir(path))
+	}
+}
+
+// TestRunUpgradeConfig_DryRunNoSideEffects verifies that
+// --dry-run previews the diff without leaving the file modified.
+func TestRunUpgradeConfig_DryRunNoSideEffects(t *testing.T) {
+	dir := t.TempDir()
+	t.Setenv("XDG_CONFIG_HOME", dir)
+
+	origDir, _ := os.Getwd()
+	projectDir := filepath.Join(dir, "project")
+	if err := os.MkdirAll(projectDir, 0o755); err != nil {
+		t.Fatalf("mkdir: %v", err)
+	}
+	if err := os.Chdir(projectDir); err != nil {
+		t.Fatalf("chdir: %v", err)
+	}
+	t.Cleanup(func() { _ = os.Chdir(origDir) })
+
+	path := filepath.Join(projectDir, ".gnoma", "config.toml")
+	if err := os.MkdirAll(filepath.Dir(path), 0o755); err != nil {
+		t.Fatalf("mkdir: %v", err)
+	}
+	original := "[provider]\nmax_tokens = 8192\n"
+	if err := os.WriteFile(path, []byte(original), 0o644); err != nil {
+		t.Fatalf("seed: %v", err)
+	}
+
+	if rc := runUpgradeConfigDryRun(path); rc != 0 {
+		t.Fatalf("runUpgradeConfigDryRun rc=%d", rc)
+	}
+
+	// File should be byte-identical to the original.
+	got, _ := os.ReadFile(path)
+	if string(got) != original {
+		t.Errorf("dry-run modified the file, got:\n%s\nwant:\n%s", got, original)
+	}
+
+	// No backup file should remain (dry-run cleans up its own backup).
+	entries, _ := os.ReadDir(filepath.Dir(path))
+	for _, e := range entries {
+		if e.Name() != "config.toml" {
+			t.Errorf("dry-run left extra file: %q", e.Name())
+		}
+	}
+}
+
+// TestRunUpgradeConfig_AlreadyCleanIsNoOp verifies that a config
+// that has only user-set non-default values produces a "nothing
+// to do" message and exit 0 — no backup, no rewrite.
+func TestRunUpgradeConfig_AlreadyCleanIsNoOp(t *testing.T) {
+	dir := t.TempDir()
+	t.Setenv("XDG_CONFIG_HOME", dir)
+
+	origDir, _ := os.Getwd()
+	projectDir := filepath.Join(dir, "project")
+	if err := os.MkdirAll(projectDir, 0o755); err != nil {
+		t.Fatalf("mkdir: %v", err)
+	}
+	if err := os.Chdir(projectDir); err != nil {
+		t.Fatalf("chdir: %v", err)
+	}
+	t.Cleanup(func() { _ = os.Chdir(origDir) })
+
+	path := filepath.Join(projectDir, ".gnoma", "config.toml")
+	if err := os.MkdirAll(filepath.Dir(path), 0o755); err != nil {
+		t.Fatalf("mkdir: %v", err)
+	}
+	clean := "[provider]\ndefault = \"anthropic\"\n"
+	if err := os.WriteFile(path, []byte(clean), 0o644); err != nil {
+		t.Fatalf("seed: %v", err)
+	}
+
+	if rc := runUpgradeConfigApply(path); rc != 0 {
+		t.Errorf("rc = %d, want 0 for already-clean file", rc)
+	}
+
+	// File content unchanged.
+	got, _ := os.ReadFile(path)
+	if string(got) != clean {
+		t.Errorf("already-clean file modified, got:\n%s", got)
+	}
+	// No backup created.
+	entries, _ := os.ReadDir(filepath.Dir(path))
+	for _, e := range entries {
+		if e.Name() != "config.toml" {
+			t.Errorf("no-op left extra file: %q", e.Name())
+		}
+	}
+}
+
+// TestRunUpgradeConfig_MissingProjectConfigIsFriendly verifies the
+// user-experience fix for the 2026-06-04 follow-up: when the
+// project .gnoma/config.toml doesn't exist, print a friendly
+// "nothing to upgrade" message and exit 0 instead of a hard
+// "no such file or directory" error. The user can pass an
+// explicit path or use --global.
+func TestRunUpgradeConfig_MissingProjectConfigIsFriendly(t *testing.T) {
+	dir := t.TempDir()
+	t.Setenv("XDG_CONFIG_HOME", dir)
+
+	origDir, _ := os.Getwd()
+	projectDir := filepath.Join(dir, "project")
+	if err := os.MkdirAll(projectDir, 0o755); err != nil {
+		t.Fatalf("mkdir: %v", err)
+	}
+	if err := os.Chdir(projectDir); err != nil {
+		t.Fatalf("chdir: %v", err)
+	}
+	t.Cleanup(func() { _ = os.Chdir(origDir) })
+
+	// No .gnoma/ dir at all — Upgrade() would error.
+	if rc := runUpgradeConfigCommand(nil); rc != 0 {
+		t.Errorf("rc = %d, want 0 for missing project config (friendly exit)", rc)
+	}
+}
+
+// TestRunUpgradeConfig_MissingGlobalConfigIsFriendly mirrors
+// the above for --global. The user-level config not existing
+// is also "nothing to upgrade", not an error.
+func TestRunUpgradeConfig_MissingGlobalConfigIsFriendly(t *testing.T) {
+	dir := t.TempDir()
+	t.Setenv("XDG_CONFIG_HOME", dir)
+	// Don't create the global config dir either.
+
+	if rc := runUpgradeConfigCommand([]string{"--global"}); rc != 0 {
+		t.Errorf("rc = %d, want 0 for missing global config (friendly exit)", rc)
+	}
+}
+
+// TestRunUpgradeConfig_GlobalFlagUpgradesGlobalConfig verifies
+// the --global flag actually points at the global config and
+// upgrades it.
+func TestRunUpgradeConfig_GlobalFlagUpgradesGlobalConfig(t *testing.T) {
+	dir := t.TempDir()
+	t.Setenv("XDG_CONFIG_HOME", dir)
+
+	// Seed a global config with a default-equivalent field.
+	globalDir := filepath.Join(dir, "gnoma")
+	if err := os.MkdirAll(globalDir, 0o755); err != nil {
+		t.Fatalf("mkdir: %v", err)
+	}
+	globalPath := filepath.Join(globalDir, "config.toml")
+	if err := os.WriteFile(globalPath, []byte("[provider]\nmax_tokens = 8192\n"), 0o644); err != nil {
+		t.Fatalf("seed: %v", err)
+	}
+
+	if rc := runUpgradeConfigCommand([]string{"--global"}); rc != 0 {
+		t.Errorf("rc = %d, want 0", rc)
+	}
+
+	got, _ := os.ReadFile(globalPath)
+	if strings.Contains(string(got), "max_tokens") {
+		t.Errorf("max_tokens at default not dropped from global config, got:\n%s", got)
+	}
+}
+
+// TestRunUpgradeConfig_GlobalWithExplicitPathIsError verifies
+// the mutually-exclusive-flag handling: --global and an
+// explicit path can't both be supplied.
+func TestRunUpgradeConfig_GlobalWithExplicitPathIsError(t *testing.T) {
+	dir := t.TempDir()
+	t.Setenv("XDG_CONFIG_HOME", dir)
+
+	if rc := runUpgradeConfigCommand([]string{"--global", "/tmp/somewhere/config.toml"}); rc != 1 {
+		t.Errorf("rc = %d, want 1 for --global + explicit path", rc)
+	}
+}
+
+// TestRunUpgradeConfig_AllFlagWalksRegistry verifies the
+// --all mode: a registry with one project that has a
+// zero-spammed config gets that config upgraded.
+func TestRunUpgradeConfig_AllFlagWalksRegistry(t *testing.T) {
+	dir := t.TempDir()
+	t.Setenv("XDG_CONFIG_HOME", dir)
+
+	// Seed a registry entry pointing at a project with a
+	// zero-spammed config.
+	projectDir := filepath.Join(dir, "project")
+	if err := os.MkdirAll(filepath.Join(projectDir, ".gnoma"), 0o755); err != nil {
+		t.Fatalf("mkdir: %v", err)
+	}
+	projectConfig := filepath.Join(projectDir, ".gnoma", "config.toml")
+	if err := os.WriteFile(projectConfig, []byte("[provider]\nmax_tokens = 8192\n"), 0o644); err != nil {
+		t.Fatalf("seed project: %v", err)
+	}
+
+	reg, _ := gnomacfg.LoadRegistry()
+	if err := reg.Record(projectDir); err != nil {
+		t.Fatalf("Record: %v", err)
+	}
+
+	if rc := runUpgradeConfigCommand([]string{"--all"}); rc != 0 {
+		t.Errorf("rc = %d, want 0", rc)
+	}
+
+	// Project config should be cleaned.
+	got, _ := os.ReadFile(projectConfig)
+	if strings.Contains(string(got), "max_tokens") {
+		t.Errorf("max_tokens at default not dropped, got:\n%s", got)
+	}
+}
+
+// TestRunUpgradeConfig_AllFlagHandlesMissingProjectFiles
+// documents the "first-run" path: the registry might list
+// projects that haven't grown their config yet. The handler
+// should report "no such file" and exit 0.
+func TestRunUpgradeConfig_AllFlagHandlesMissingProjectFiles(t *testing.T) {
+	dir := t.TempDir()
+	t.Setenv("XDG_CONFIG_HOME", dir)
+
+	// Seed a registry entry pointing at a project with NO
+	// .gnoma/config.toml.
+	projectDir := filepath.Join(dir, "project-no-config")
+	if err := os.MkdirAll(projectDir, 0o755); err != nil {
+		t.Fatalf("mkdir: %v", err)
+	}
+
+	reg, _ := gnomacfg.LoadRegistry()
+	if err := reg.Record(projectDir); err != nil {
+		t.Fatalf("Record: %v", err)
+	}
+
+	if rc := runUpgradeConfigCommand([]string{"--all"}); rc != 0 {
+		t.Errorf("rc = %d, want 0 (missing files are friendly exits)", rc)
+	}
+}
+
+// TestRunUpgradeConfig_AllFlagMutuallyExclusiveWithPath
+// verifies --all and an explicit path are mutually exclusive.
+func TestRunUpgradeConfig_AllFlagMutuallyExclusiveWithPath(t *testing.T) {
+	dir := t.TempDir()
+	t.Setenv("XDG_CONFIG_HOME", dir)
+
+	if rc := runUpgradeConfigCommand([]string{"--all", "/tmp/somewhere/config.toml"}); rc != 1 {
+		t.Errorf("rc = %d, want 1 for --all + explicit path", rc)
+	}
+}
@@ -24,27 +24,41 @@ The "ollama" path is the easiest if you're already running a local model — it

 ## Presets

-Presets use `reecdev/tiny3.5:500m` as the default model — a 500 M-parameter Qwen3.5 distillation with tool support, available on Ollama. Pull it once with:
+Presets use `qwen3:0.6b` as the default model — a 600 M-parameter Qwen3 instruction-tuned model with native `/no_think` support, available on Ollama. Pull it once with:

 ```bash
-ollama pull reecdev/tiny3.5:500m   # ~1 GB
-# or the 1.5 B variant for slightly better quality:
-ollama pull reecdev/tiny3.5:1.5b   # ~3 GB
+ollama pull qwen3:0.6b           # ~520 MB
 ```

+### Model choice notes
+
+Empirical testing (2026-05-25) across three candidate SLMs on identical prompts:
+
+| Model | Classifier success | Notes |
+|---|---|---|
+| `qwen3:0.6b` | consistent across trivial + knowledge prompts | recommended default; honours `/no_think` cleanly |
+| `functiongemma:270m` | works on trivial prompts, derails on knowledge ones | needs function-signature prompt rewrite or LoRA fine-tune to be reliable |
+| `gemma3:1b` | unusable | emits malformed JSON (just `{` or invented keys) |
+| `reecdev/tiny3.5:1.5b` | unusable | thinking-mode distillation; ignores `/no_think` and emits `<Thought Process>` blocks |
+| `qwen2.5-coder:1.5b` | unusable | code-completion-tuned; ignores the classifier prompt entirely and answers in prose |
+
 Substitute any small Ollama model you prefer. The probe at startup reads each model's actual capability — `tools` enables the SLM arm to handle simple file reads; without it, the SLM only handles knowledge-only prompts.

+If your SLM is task-specialised (function-call models like FunctionGemma; embedding-only models; code-completion-tuned models) and produces wrong-shape output when asked to answer a general prompt, set `register_as_arm = false` so the SLM stays classifier-only and execution routes to other local arms.
+
 ### Preset 1 — Ollama (recommended for most users)

 ```toml
 [slm]
-enabled = true
-backend = "ollama"
-model   = "reecdev/tiny3.5:500m"
+enabled         = true
+backend         = "ollama"
+model           = "qwen3:0.6b"
+register_as_arm = true              # default; set false for classifier-only models
+classify_timeout = "15s"            # default; bump for slow cold-load
 # base_url defaults to http://localhost:11434
 ```

-Prereq: `ollama pull reecdev/tiny3.5:500m` (or any model you'd rather use).
+Prereq: `ollama pull qwen3:0.6b` (or any model you'd rather use).

 ### Preset 2 — llama.cpp server

@@ -150,10 +164,10 @@ Output looks like:
 ```
 slm enabled: true
 slm backend: ollama
-  model:   reecdev/tiny3.5:500m
+  model:   qwen3:0.6b

 live probe:
-  ✓ ollama ready (model=reecdev/tiny3.5:500m, boot=0s)
+  ✓ ollama ready (model=qwen3:0.6b, boot=0s)
 ```

 Run a few prompts, then check:
@@ -1,5 +1,14 @@
 # Tool-Router Specialization (functiongemma) — 2026-05-23

+> **Companion plan from 2026-05-25:**
+> [`2026-05-25-encoder-bandit-router.md`](2026-05-25-encoder-bandit-router.md)
+> sketches an alternative architecture (encoder + contextual bandit
+> instead of decoder-SLM-as-classifier). The two are complementary,
+> not competing — FunctionGemma fits as the optional Phase 5 "JSON
+> sanity layer" in that plan. Decide which track to invest in based
+> on the did-switch-rate telemetry (this plan) vs the bandit-data
+> accumulation (companion plan).
+
 Follow-up to
 [`2026-05-19-post-slm-unlock.md`](2026-05-19-post-slm-unlock.md)
 Phase A, which shipped two-stage tool routing: round 1 sends a single
@@ -0,0 +1,344 @@
+# Encoder + Contextual-Bandit Router — 2026-05-25
+
+Proposes a long-arc architectural rethink of gnoma's routing layer:
+**replace the decoder-SLM-as-classifier design with an encoder-only
+embedding model feeding a contextual bandit policy**, and treat a
+strict tiny SLM (FunctionGemma-270M-it) as the optional "emit a
+structured route decision" layer rather than the primary classifier.
+
+Surfaced from external research (RouteLLM, ModernBERT, Gemma 3
+270M, Qwen3-Embedding, BGE-M3) brought into the 2026-05-25
+diagnostic session where gnoma's current decoder-SLM classifier
+exhibited a 100% failure rate across two model swaps
+(`reecdev/tiny3.5:1.5b`, `qwen2.5-coder:1.5b`).
+
+This plan is **strategic / multi-month**. Phase 1 below is the only
+piece scoped for near-term implementation; everything else hinges on
+the bandit-vs-SLM strategic decision tracked in the existing
+`Bandit selector — design decisions deferred` TODO entry.
+
+Sibling plans:
+[`2026-05-23-tool-router-specialization.md`](2026-05-23-tool-router-specialization.md)
+already covers the **FunctionGemma fine-tune** track as the
+strict-SLM option; this plan adds the **encoder + bandit** track
+as the alternative (and arguably better-suited) architecture.
+
+---
+
+## Problem
+
+The current router has three coupled problems:
+
+1. **The classifier is a decoder LLM in a job an encoder would do
+   better.** Routing is a classification task with cost/quality
+   trade-offs, not a reasoning task. Asking a decoder model to emit
+   structured JSON for every classify call is high-latency, fragile
+   to chain-of-thought leakage, and indeterministic.
+
+2. **The bandit can't actually learn quality** because the only
+   success signal is `err == nil` (per `internal/engine/loop.go:118`).
+   EMA scores converge to 1.00 for every arm — see the 2026-05-24
+   `router stats` snapshot where 22 of 25 arm/task pairs sit at
+   exactly 1.00.
+
+3. **The classifier and bandit live in adjacent code but were
+   designed in separate phases**, so the integration point (`Task`
+   built by SLM classifier → fed to `selectBest`) is just data
+   flow, not a learning loop. The SLM's wins/losses don't update
+   the SLM; the bandit's wins/losses don't change which arms the
+   classifier considers.
+
+The 100% SLM-failure incident on 2026-05-25 made (1) urgent. The
+zero-discrimination EMA on 2026-05-24 made (2) urgent. (3) is the
+underlying integration debt.
+
+---
+
+## Non-goals
+
+- **Killing the existing SLM classifier today.** Phase 1 of this
+  plan is purely additive (encoder feature extraction); the existing
+  classifier stays as a baseline until the new path is measurably
+  better.
+- **Reimplementing bandit math.** LinUCB and Thompson Sampling are
+  well-understood. The work is the feature pipeline and reward
+  function, not the policy core.
+- **Choosing a single embedding model permanently.** Phase 1 ships
+  with a default but exposes a `[slm.embedding].model` knob so
+  swapping is config-only.
+- **The strict-SLM track.** FunctionGemma fine-tuning is the sibling
+  `2026-05-23-tool-router-specialization.md` plan; this plan
+  references it but does not duplicate it.
+
+---
+
+## Background — research summary
+
+Citations follow the user-provided research thread (RouteLLM 2024,
+ModernBERT 2024, Google FunctionGemma 2025).
+
+- **RouteLLM** tested router types as a classification problem:
+  similarity routing, matrix factorization, BERT classifier, causal
+  LLM classifier. The BERT classifier was competitive with the
+  causal-LLM classifier at lower cost and latency. Routing is a
+  classification task; treating it like a generation task is paying
+  generation cost for classification value.
+- **ModernBERT** (Dec 2024) is an encoder-only model with 8k context,
+  trained partly on code, designed for fast classification and
+  retrieval. The 'base' size is ~150M parameters, the 'large' size
+  ~400M. Both are tiny compared to even small decoder LLMs.
+- **FunctionGemma-270M-it** (Aug 2025) is Google's small model
+  fine-tuned for natural-language → function-call output. Google's
+  own positioning materials list **query routing** as a use case.
+- **Qwen3-Embedding-0.6B** and **BGE-M3** are strong multilingual
+  embedding models with long-context support; either can serve as
+  feature extractors for downstream classification or bandit
+  policies.
+
+The throughline: **encoder models are the right tool for the
+classification side of routing**; generative SLMs (FunctionGemma)
+are the right tool only when the *output* must be a structured
+decision blob with confidence + tags + fallback. For pure routing,
+encoder features + bandit policy is cheaper, faster, more
+deterministic.
+
+---
+
+## Approach overview
+
+Five phases. Phase 1 is near-term; Phases 2–4 are the actual
+architectural shift; Phase 5 is the long-arc fine-tune.
+
+### Phase 1 — Embedding feature scaffold (near-term, additive)
+
+Add an embedding pipeline that runs alongside the existing
+classifier. Extract features for every prompt; log them to disk
+next to the existing quality-EMA. No routing decision changes yet.
+
+**Why first:** lets us build up a labelled dataset of (prompt,
+features, arm, outcome) tuples without disturbing today's routing
+behaviour. Phase 2 trains against this dataset.
+
+### Phase 2 — Contextual bandit over the feature set
+
+Once Phase 1 has ~500–1000 labelled observations, swap `selectBest`
+from heuristic quality + EMA score to a LinUCB-style contextual
+bandit that takes the embedding features + the existing arm metadata
+(MaxComplexity, CostWeight, Strengths). The existing EMA quality
+score becomes one feature among many.
+
+### Phase 3 — Retire the decoder-SLM classifier
+
+When Phase 2 routing is measurably better than today's heuristic +
+EMA blend, the decoder-SLM classifier (currently producing 0
+useful classifications on the user's setup) is no longer
+load-bearing. Deprecate it; keep the same `[slm]` config knobs for
+backwards compatibility but route them at a different runtime path.
+
+### Phase 4 — ModernBERT fine-tune
+
+The off-the-shelf embedding model from Phase 1 (BGE-M3 or
+Qwen3-Embedding-0.6B by default) gives general-purpose embeddings.
+Phase 4 fine-tunes a router-specific classification head on top of
+ModernBERT-base using the labelled dataset accumulated since Phase
+1. Pure performance win; falls back gracefully to off-the-shelf
+embeddings if the fine-tune isn't loaded.
+
+### Phase 5 — FunctionGemma JSON sanity layer (optional)
+
+For users who want a structured route decision (arm + confidence +
+fallback) alongside or instead of the bandit output, plug
+FunctionGemma-270M-it (fine-tuned per the
+`tool-router-specialization` plan) as a final-stage decision blob
+emitter. Sits *after* the encoder + bandit, not in front of them.
+
+---
+
+## Phase 1 — Embedding feature scaffold (detailed)
+
+This is the only phase scoped for near-term implementation. The
+others depend on Phase 1's data accumulation.
+
+### What lands
+
+- New package `internal/router/features` with:
+  - `Embedder` interface: `Embed(ctx, prompt string) ([]float32, error)`.
+  - Implementations: `OllamaEmbedder`, `BGE3Embedder`, `NoopEmbedder`
+    (default; returns nil features when no embedding model is
+    configured).
+- New config `[slm.embedding]` section:
+  ```toml
+  [slm.embedding]
+  enabled  = false                       # default off; opt-in
+  backend  = "ollama"                    # ollama | bge-m3 | noop
+  model    = "qwen3-embedding:0.6b"      # ollama model tag
+  base_url = ""                          # backend endpoint override
+  ```
+- Feature extraction hook in `internal/engine/loop.go`: after the
+  classifier runs but before `selectBest`, compute the embedding
+  for the prompt and attach to the routing `Task` as an opaque
+  `Features []float32` field.
+- New on-disk store at `~/.config/gnoma/router-features.jsonl`,
+  one record per observation: `{ts, prompt_hash, features,
+  task_type, arm_id, success, tokens, duration}`.
+  - `prompt_hash` is a SHA-256 of the prompt — never the prompt
+    itself — to keep the file local-only-but-not-secret-laden.
+  - Append-only, atomic-write, incognito-gated, same discipline as
+    the firewall audit log.
+- No selector change. `selectBest` continues to use today's
+  heuristic + EMA blend. Phase 1 just observes.
+
+### Why off by default
+
+Embedding inference adds 50–200ms per prompt depending on backend
+and model size. That latency is fine for ollama users running on
+a workstation, painful for users on slower setups. Opt-in keeps
+the regression risk at zero.
+
+### Phase 1 task list
+
+- **F1-1:** Define the `Embedder` interface and `NoopEmbedder` in
+  `internal/router/features/`.
+- **F1-2:** `OllamaEmbedder` wraps `provider/openaicompat` with the
+  ollama embedding endpoint (`/api/embeddings`).
+- **F1-3:** Add the `[slm.embedding]` config section to
+  `internal/config/config.go` with the same defaults-via-zero
+  discipline as the rest of the config.
+- **F1-4:** Wire the embedder into `loop.go` between classifier and
+  selector. Failures log at Debug and don't block routing.
+- **F1-5:** Append-only feature store in
+  `~/.config/gnoma/router-features.jsonl` with atomic writes,
+  incognito gate, opt-out via `[slm.embedding].enabled = false`.
+- **F1-6:** Tests covering: embedder mock + observation record;
+  noop embedder produces empty features; incognito skips the
+  store entirely.
+
+---
+
+## Phase 2+ — Bandit policy (sketch only; needs data first)
+
+Spelled out for context. Not for near-term implementation.
+
+### Feature set per the research
+
+```
+prompt_embedding          — 384-1024 dim depending on model
+token_count               — len of tokenized prompt
+language                  — ISO code from a small lang-detect
+has_code                  — fenced-block heuristic
+has_error_log             — pattern match for stack traces
+needs_tools               — from current heuristic
+needs_vision              — from [Image:...] markers
+estimated_complexity      — current heuristic score
+requested_latency         — turn-budget hint (future)
+arm_context_window        — from arm metadata
+arm_vram_cost             — from arm metadata
+arm_avg_latency           — from quality EMA
+arm_success_rate          — from quality EMA
+```
+
+### Reward function per the research
+
+```
+reward = quality_score
+       - latency_penalty
+       - vram_penalty
+       - failure_penalty
+       - escalation_penalty
+```
+
+- `quality_score`: 1.0 on success, 0.0 on hard error today; richer
+  signal (elf-mediated, user thumbs, tool-call success) once the
+  TODO `Bandit selector — design decisions deferred` resolves.
+- `latency_penalty`: monotone in observed seconds.
+- `vram_penalty`: monotone in declared VRAM cost.
+- `failure_penalty`: hard cost on explicit errors (sandbox
+  denied, parse failed).
+- `escalation_penalty`: cost when a downstream elf had to escalate
+  to a heavier arm because this arm failed.
+
+### Policy
+
+LinUCB (linear contextual bandit, deterministic exploration
+bounded by UCB) or Thompson Sampling (Bayesian, smoother
+exploration). LinUCB is the safer starting point — fewer
+hyperparameters, well-known behaviour, easier to debug.
+
+---
+
+## Risks
+
+- **Latency.** Embedding inference adds 50–200ms per prompt. Phase
+  1's opt-in default means users see no regression; Phase 2's
+  "make it default" decision requires latency benchmarks first.
+- **Data sparsity for fine-tuning (Phase 4).** ModernBERT
+  fine-tuning needs ~10k labelled observations to start being
+  useful. Phase 1 might run for months before Phase 4 is viable.
+  Plan B: synthesise labels from existing prompt logs + rule-based
+  pre-labels.
+- **Off-the-shelf embedding quality.** BGE-M3 / Qwen3-Embedding
+  weren't trained specifically for routing decisions. Phase 4
+  exists precisely to close this gap; Phase 1's data accumulation
+  is what makes Phase 4 possible.
+- **Architectural complexity.** This plan introduces an entire new
+  ML pipeline (embedder → feature store → bandit → reward loop).
+  Phase 1 keeps it side-by-side with the existing path; Phase 2's
+  "swap" decision is reversible because the existing path stays
+  in code.
+- **Privacy.** Prompt hashes (not raw prompts) in the feature
+  store. Still a local-only file; same opt-out plumbing as the
+  project registry from the config-migration plan.
+
+---
+
+## Open questions
+
+- **Should the feature store be per-project or global?** Per-project
+  is more privacy-respecting (one project's prompts don't influence
+  another's routing). Global is more data-efficient (more samples
+  → better bandit). Phase 1 chooses global by default; revisit
+  during Phase 2.
+- **How does this interact with `[router].prefer = local|cloud`?**
+  Easy answer: prefer policy stays as a hard tier-shift, applied
+  after bandit selection. Bandit picks the best feasible arm; the
+  prefer policy is consulted as a final filter / weight.
+- **What about CLI-agent subprocess arms?** They proxy to cloud but
+  run locally; today's `prefer` treats them as non-local. Bandit
+  features should include `is_subprocess` as a distinct feature
+  so the policy can learn the user's preferences for those arms
+  independent of local/cloud.
+- **Cold start.** With no observations, the bandit defaults to
+  pure exploration. Should we seed with the existing heuristic
+  defaults from `internal/router/defaults.go`? Probably yes —
+  warm-start with the curated Strengths as priors.
+
+---
+
+## Rollout
+
+- **Phase 1** ships as v0.5.0 (additive, opt-in, no behaviour
+  change by default). Schema-touching so warrants a minor bump.
+- **Phase 2** ships when Phase 1 has accumulated enough data
+  (~500–1000 observations per user) — opt-in via
+  `[router].bandit_policy = "linucb"` initially, becoming default
+  in a later release once measured better.
+- **Phase 3 (deprecation of decoder-SLM classifier)** is a v0.6.x
+  conversation, gated on Phase 2 measurably outperforming.
+- **Phase 4 (ModernBERT fine-tune)** is v0.7+ — requires the
+  fine-tuned model artifact distributed via Ollama or HF, plus
+  the auto-download story.
+- **Phase 5 (FunctionGemma sanity layer)** is independent of all
+  of the above; lands when the sibling `tool-router-specialization`
+  plan justifies it on did-switch-rate telemetry.
+
+---
+
+## Cross-references
+
+- TODO.md entry "Bandit selector — design decisions deferred" —
+  the strategic question this plan answers in the long run.
+- TODO.md entry "Tool-router specialization (functiongemma)" — the
+  sibling track; complementary, not competing.
+- [`2026-05-23-tool-router-specialization.md`](2026-05-23-tool-router-specialization.md) — FunctionGemma fine-tune plan.
+- [`2026-05-07-gnoma-roadmap.md`](2026-05-07-gnoma-roadmap.md) §Phase 4 — the original "re-evaluate bandit learning" entry.
+- 2026-05-25 diagnostic session (this conversation) — the trigger.
@@ -0,0 +1,375 @@
+# Agent Client Protocol (ACP) — 2026-06-04
+
+Adds **both directions** of ACP to gnoma:
+
+1. **gnoma as ACP agent (server)** — `gnoma acp` over stdio so any
+   ACP-capable editor (Zed, Kiro, OpenCode, …) can drive gnoma as an
+   external coding agent.
+2. **gnoma as ACP client** — gnoma spawns *external* ACP agents
+   (Claude, Gemini CLI, Codex, …) and exposes them as router-arm
+   provider backends, the standardized successor to the current
+   `internal/provider/subprocess` CLI-agent arms.
+
+Adds the TODO.md entry "Agent Client Protocol (ACP) support".
+
+Upstream: <https://github.com/agentclientprotocol> ·
+spec <https://agentclientprotocol.com>
+
+---
+
+## Problem
+
+ACP is "the LSP for AI coding agents": a JSON-RPC 2.0 protocol, spoken
+over stdio, that lets editors (clients) spawn agents (subprocesses) and
+talk to them in a standard way — eliminating point-to-point editor↔agent
+integrations. Zed, Kiro, OpenCode and others are clients; Claude, Gemini
+CLI, Codex ship as ACP agents.
+
+Today gnoma is reachable only via its own TUI and pipe mode. It cannot
+plug into an editor's agent panel. Supporting ACP makes gnoma a drop-in
+agent inside any ACP client, which is a large distribution surface for
+near-zero ongoing cost — the protocol is stable and gnoma already owns
+all the hard parts (an agentic engine, tools, permissions, MCP).
+
+### Why this is a natural fit
+
+- gnoma already speaks **JSON-RPC over stdio** for MCP
+  (`internal/mcp/jsonrpc.go` `Request`/`Notification`,
+  `internal/mcp/transport*.go`) — that machinery is reusable for the
+  ACP server side (gnoma is the *server* of the JSON-RPC channel here,
+  the mirror of its MCP-client role).
+- The agentic loop is already factored behind
+  `session.Session` (`internal/session/session.go:54`,
+  `Local.Send`/`SendWithOptions` at `local.go:80-85`) driving
+  `engine.Engine` (`internal/engine/engine.go`). ACP `session/prompt`
+  maps onto one `Send`.
+- Permissions already route through a pluggable prompt function
+  (`permission.NewChecker(mode, rules, promptFn)`,
+  `cmd/gnoma/main.go:668`). ACP's `session/request_permission` callback
+  is just another `promptFn` implementation.
+- ACP `session/new` can declare the `mcpServers` the agent should
+  connect to — gnoma already has an MCP manager
+  (`internal/mcp/manager.go`) to honour that in the same handshake.
+
+### Role decision — both, server first
+
+Both roles ship under this plan. Sequence them: **agent (server)
+first** — it's the larger distribution win and exercises the wire
+protocol end-to-end — then **client**, which reuses the same
+`internal/acp` protocol/types from the other side. They share the
+JSON-RPC framing, content-block translation, and capability structs;
+only the dispatch direction differs.
+
+The client role is the standardized successor to
+`internal/provider/subprocess`: that package shells out to CLI agents
+with one-shot `--output-format stream-json` (or prompt-augmentation
+fallback), runs the agent's *own* loop with `--yolo`/`--trust`, and
+cannot surface structured tool calls (it sets `ToolUse:false` for
+agents lacking stream-json — see TODO "Native agy JSON output"). ACP
+fixes all of that: a persistent JSON-RPC session, structured
+`session/update` tool-call events, real permission round-trips, and
+cancellation.
+
+### No Go SDK exists
+
+Official SDKs are TypeScript, Python, Rust, Kotlin — **no Go**. gnoma
+implements the wire protocol natively against the published JSON
+schema. Pin the supported `protocolVersion` and the exact method set
+against the spec at implementation time (the protocol is young and
+still moving).
+
+---
+
+## Non-goals
+
+- **A full editor UI.** In agent mode gnoma renders nothing; the client
+  owns the UI. gnoma emits `session/update` notifications and the client
+  displays them.
+- **Replacing the TUI / pipe modes.** ACP agent mode is a third entry
+  mode alongside them, not a replacement.
+- **Replacing `internal/provider/subprocess` outright.** The ACP-client
+  provider is added alongside it; the stream-json subprocess path stays
+  for agents that don't (yet) speak ACP. Deprecation is a later call.
+- **Custom transports.** stdio only (the ACP norm: local agent as a
+  subprocess). No socket/HTTP transport.
+- **gnoma-drives-gnoma over ACP as the default.** gnoma's native
+  providers/router remain the primary path; ACP-client arms are an
+  additional backend source.
+
+---
+
+## Design
+
+The two roles share one package (`internal/acp`): JSON-RPC framing,
+content-block translation, and the capability/handshake types are
+direction-agnostic. **Part A** is the agent (server) side; **Part B**
+is the client side. Build Part A first.
+
+## Part A — gnoma as ACP agent (server)
+
+### New entry mode: `gnoma acp`
+
+Add a third mode beside TUI and pipe (mode is chosen near
+`cmd/gnoma/main.go:106-114`). Selected by an explicit `acp` subcommand
+(stdio is shared with the JSON-RPC channel, so it can't be
+TTY-autodetected the way TUI is). In ACP mode:
+
+- **No banner, no TUI, no stdout chatter.** stdout/stdin are the
+  JSON-RPC pipe; all human/diagnostic logging goes to **stderr** only
+  (the firewall/audit slog sink must not write to stdout). Audit this
+  carefully — any stray stdout write corrupts the protocol stream.
+- Reuse the existing session/engine/router/security construction; only
+  the front-end loop differs.
+
+### Package layout
+
+```
+internal/acp/
+  protocol.go   // ACP types: handshake, capabilities, content blocks (shared)
+  jsonrpc.go    // framing reused/forked from internal/mcp/jsonrpc.go (shared)
+  content.go    // ContentBlock <-> message.Message translation (shared)
+  server.go     // Part A: stdio JSON-RPC read loop; method dispatch
+  session.go    // Part A: ACP session <-> gnoma session.Session bridge
+  permission.go // Part A: session/request_permission promptFn
+  update.go     // Part A: gnoma stream events -> session/update
+  client.go     // Part B: spawn external agent, drive the handshake/prompt
+```
+
+A separate `internal/provider/acp/` holds the **Part B provider**
+adapter (mirrors `internal/provider/subprocess/`), depending on
+`internal/acp/client.go`.
+
+Reuse `internal/mcp/jsonrpc.go` framing if it generalises; otherwise
+fork the minimal envelope (it's tiny). Keep ACP types separate from MCP
+types — they are different protocols that happen to share JSON-RPC.
+
+### Method handlers (agent side)
+
+Map each ACP method to existing gnoma machinery. Pin exact shapes to the
+spec; the mapping is the contract:
+
+| ACP method (client→agent) | gnoma handling |
+|---|---|
+| `initialize` | Reply with `agentCapabilities` (tools, MCP support, prompt streaming, permission modes), `agentInfo` (name "gnoma", `buildVersion`). Negotiate `protocolVersion`. |
+| `session/new` | Build a `session.Local` (router, security, tools wired as in main). Honour `cwd` (run it through `safety.ClassifyCWD`), and connect any `mcpServers` the client declares via `internal/mcp/manager.go`. Return a `sessionId`. |
+| `session/load` (if advertised) | Rehydrate from `internal/session` store (`SessionStore.Load`). Optional — only if we advertise the capability. |
+| `session/prompt` | Translate ACP `ContentBlock`s → `message.Message`, call `Send`/`SendWithOptions`, stream results back as `session/update`, return the stop reason. |
+| `session/cancel` (notification) | Cancel the in-flight turn's context. |
+
+Agent→client calls gnoma must make:
+
+| ACP call (agent→client) | Trigger |
+|---|---|
+| `session/update` (notification) | Per engine stream event: assistant text deltas, tool-call start/args/result, plan/thoughts, token usage. Map gnoma's stream iterator (`Next/Current`) to update variants. |
+| `session/request_permission` | gnoma's `permission.Checker` promptFn — instead of console `Scanln`, send this and await the client's allow/deny (with the ACP "allow once / always" options mapped to gnoma permission modes). |
+| `fs/read_text_file`, `fs/write_text_file` | **If** we advertise client-side fs and the client supports it, route the `fs` tools through the client so edits show in the editor's buffers. Otherwise gnoma's own `internal/tool/fs` operates on disk directly. Decide per capability negotiation. |
+
+### Streaming bridge
+
+The engine produces a pull-based stream (`Next() / Current() / Err() /
+Close()`). The ACP bridge consumes it and emits a `session/update` per
+event. Backpressure: ACP is fire-and-forget notifications, so no
+blocking — but coalesce text deltas if the client is slow (config knob,
+default flush per token).
+
+### Security & safety interplay
+
+- The `SafeProvider` firewall boundary and the per-session audit log
+  apply unchanged — ACP is a front-end, providers/tools sit behind the
+  same security layer.
+- `safety.ClassifyCWD` runs on the `session/new` `cwd`; a `refuse`
+  classification returns an ACP error rather than starting the session.
+- Egress allowlist (`2026-06-04-egress-allowlist.md`) applies as usual.
+- Incognito: expose a way to start an ACP session incognito (capability
+  flag or `session/new` param) so editor-driven sessions can be
+  non-persistent.
+
+### MCP-in-ACP
+
+When `session/new` lists `mcpServers`, spin them up through the existing
+manager so the editor's MCP config and gnoma's converge in one
+handshake (this is the headline ACP×MCP integration). gnoma's own
+config-level MCP servers still load too; merge, don't replace.
+
+---
+
+## Part B — gnoma as ACP client (external agents as router arms)
+
+gnoma connects to external ACP agents and exposes each as a router-arm
+backend, the standardized successor to `internal/provider/subprocess`.
+gnoma plays the *client* (editor) side of the JSON-RPC channel.
+
+### Provider adapter
+
+Add `internal/provider/acp/` implementing the `provider.Provider`
+contract (`Stream`, `Name`, `Models`, `DefaultModel`) — the same surface
+the subprocess provider satisfies
+(`internal/provider/subprocess/provider.go:28-62`):
+
+- **Spawn + handshake.** On first use (or at discovery), spawn the agent
+  subprocess (`exec.CommandContext`, with the Windows/Unix process-group
+  handling from `2026-06-04-cross-platform.md`), send `initialize` as the
+  client, then `session/new` with gnoma's `cwd` and — crucially —
+  gnoma's *own* MCP servers passed through as the `mcpServers` list so
+  the external agent shares gnoma's tool surface.
+- **`Stream` → `session/prompt`.** Translate the gnoma `Request`
+  messages into ACP `ContentBlock`s, send `session/prompt`, and turn the
+  incoming `session/update` notifications back into gnoma's pull-based
+  stream events (`EventTextDelta`, structured tool-call events, usage).
+  This is the win over the subprocess provider: tool calls arrive
+  **structured**, not as opaque `EventTextDelta` text.
+- **Permission callbacks.** The external agent sends
+  `session/request_permission` to gnoma (now the client). Route these
+  through gnoma's existing `permission.Checker` so the *user's* gnoma
+  permission policy governs the sub-agent — a strict improvement over
+  today's `--yolo`/`--trust` subprocess invocations that bypass gnoma's
+  gate entirely.
+- **`fs/*` callbacks.** Route the agent's file reads/writes through
+  gnoma's `internal/tool/fs` guard so the path-safety boundary still
+  applies.
+- **Cancellation.** gnoma's turn-cancel sends ACP `session/cancel`.
+
+### Discovery & registration
+
+Mirror the subprocess flow (`cmd/gnoma/main.go:521-531`):
+
+- Discover ACP agents from config (`[acp.agents]` — command + args +
+  optional capability hints) and/or a known-agents table analogous to
+  `subprocess/agent.go:60` (`knownAgents`).
+- Register each as a `router.Arm` (a new `IsACPAgent` flag, or reuse
+  `IsCLIAgent` with a transport discriminant). Set `Capabilities` from
+  the ACP `initialize` response — notably `ToolUse:true`, which the
+  subprocess provider often can't claim.
+- Wrap in `security.WrapProvider(..., fwRef)` exactly like every other
+  arm so the firewall + audit + egress boundaries hold.
+
+### Relationship to the subprocess provider
+
+Additive. Agents that speak ACP (Claude, Gemini CLI, Codex increasingly
+do) get the ACP arm; agents that only do one-shot stream-json keep the
+subprocess arm. Where both exist for one binary, prefer ACP. This also
+unblocks the "Native agy JSON output" backlog item for any agent that
+exposes ACP instead of `--output-format stream-json`.
+
+---
+
+## Touch-points (file:line)
+
+**Part A — agent (server):**
+
+| Change | Location |
+|---|---|
+| New ACP package | `internal/acp/` |
+| Entry mode dispatch | `cmd/gnoma/main.go` (mode select ~`:106`, subcommand dispatch ~`:178`) |
+| stdout→stderr log discipline | logger setup (`main.go:100-114`) |
+| Session bridge | `internal/session` (`Session`/`Local`) |
+| Permission callback | `internal/permission` checker promptFn (`main.go:645-668`) |
+| Stream→update | engine stream iterator (`internal/engine`, `internal/stream`) |
+| MCP per-session | `internal/mcp/manager.go` |
+| JSON-RPC framing reuse | `internal/mcp/jsonrpc.go` |
+
+**Part B — client (external agents as arms):**
+
+| Change | Location |
+|---|---|
+| ACP-client provider | new `internal/provider/acp/` (mirrors `internal/provider/subprocess/`) |
+| Client handshake/driver | `internal/acp/client.go` |
+| Arm discovery + registration | `cmd/gnoma/main.go:521-531` (subprocess pattern), `[acp.agents]` config |
+| Known-agents table | analogous to `internal/provider/subprocess/agent.go:60` |
+| Arm flag | `router.Arm` (`IsACPAgent`, or `IsCLIAgent` + transport) |
+| Security wrap | `security.WrapProvider(..., fwRef)` |
+
+---
+
+## Testing (TDD — write first)
+
+- **Protocol unit tests (no real provider):**
+  - `initialize` handshake: version negotiation, advertised
+    capabilities are stable and accurate.
+  - `session/new` → returns a sessionId; honours `cwd`; rejects a
+    `refuse`-classified cwd with an ACP error.
+  - `session/prompt` with a stubProvider: ContentBlocks translate in,
+    `session/update`s stream out in order, correct stop reason.
+  - `session/cancel` aborts the in-flight turn (context cancellation
+    observed).
+  - Permission: a tool call triggers `session/request_permission`; a
+    "deny" response blocks the tool; "allow always" updates the mode.
+  - **stdout purity test:** drive a full prompt and assert stdout
+    contains *only* valid JSON-RPC frames (no banner/log leakage) — this
+    is the most common ACP-agent bug.
+- **Conformance:** run gnoma against the upstream ACP test client /
+  example client (Rust/TS) in a `//go:build integration` test if one is
+  available; otherwise a recorded-transcript fixture.
+- **MCP-in-ACP:** `session/new` with an `mcpServers` entry spins the
+  server up and its tools become callable in that session.
+- **Part B (client) unit tests** — drive a *fake ACP agent* (a small
+  in-process JSON-RPC responder, the mirror of the agent-side tests):
+  - Provider `Stream` performs `initialize`+`session/new`+`session/prompt`
+    and yields gnoma stream events in order, with **structured** tool-call
+    events (not opaque text).
+  - An inbound `session/request_permission` is routed through
+    `permission.Checker` and a deny blocks the call.
+  - An inbound `fs/write_text_file` is mediated by the `internal/tool/fs`
+    guard (a guarded path is refused).
+  - Turn cancel emits `session/cancel`; the subprocess is reaped (tie to
+    cross-platform process-group handling).
+  - Discovery registers a fake ACP agent as an arm with `ToolUse:true`.
+- **Round-trip (loopback):** point gnoma's ACP-*client* at a `gnoma acp`
+  *server* subprocess and run a prompt end-to-end — exercises both parts
+  over a real stdio pipe.
+
+### Acceptance criteria
+
+**Part A (agent/server):**
+
+1. `gnoma acp` speaks the handshake and a full prompt turn over stdio.
+2. gnoma appears and works as an external agent in Zed (manual: add
+   gnoma to Zed's external-agents config, run a prompt, approve a tool).
+3. Tool permission prompts surface in the client and gate execution.
+4. stdout carries only JSON-RPC; all logs go to stderr.
+5. Cancelling from the editor stops the turn.
+6. MCP servers declared by the client in `session/new` are available in
+   that session.
+
+**Part B (client):**
+
+7. An external ACP agent configured under `[acp.agents]` appears as a
+   router arm (`gnoma providers` lists it) with `ToolUse:true`.
+8. Routing a task to that arm runs a full turn via ACP, surfacing the
+   sub-agent's tool calls **structured** in gnoma's stream.
+9. The sub-agent's permission requests are gated by the user's gnoma
+   permission policy (not auto-approved).
+10. The sub-agent's file writes pass through gnoma's fs guard.
+11. Loopback: `gnoma acp` driven by gnoma's own ACP-client completes a
+    prompt end-to-end.
+
+---
+
+## Open questions (resolve against the live spec at implementation)
+
+- Exact `protocolVersion` to target and the precise capability struct
+  shapes (the schema is the source of truth; pin a version).
+- Whether to advertise client-side `fs/*` (edits flow through the
+  editor's buffers) vs. direct-disk fs tools — depends on parity and on
+  how gnoma's `internal/tool/fs` guard composes with editor-mediated
+  writes.
+- `session/load` support (needs our session store to round-trip the
+  ACP transcript shape).
+- **(Part B)** How a sub-agent's own model/cost is represented in the
+  router — an ACP arm's tokens are billed by *that* agent, so
+  `CostWeight`/`CostPer1k*` are opaque. Likely model it like the
+  subprocess arms (no metered cost; selection driven by `Strengths`).
+- **(Part B)** Lifecycle: spawn-per-session vs. a pooled long-lived
+  agent process reused across turns; how cancellation and crashes are
+  recovered (ties to session error-recovery, `0d3d190`).
+
+---
+
+## TODO linkage
+
+New "Agent Client Protocol (ACP) support" entry in `TODO.md` (In
+flight) links here. Covers **both** roles: gnoma as ACP agent (Part A)
+and gnoma as ACP client driving external agents as router arms
+(Part B). Part B is the standardized successor to
+`internal/provider/subprocess` and overlaps the "Native agy JSON
+output" backlog item.
@@ -0,0 +1,156 @@
+# Config Migration — Follow-ups from Phase 1 (2026-06-04)
+
+Caveats discovered while shipping Phase 1 of
+[`2026-05-24-config-migration.md`](2026-05-24-config-migration.md) in
+commit `a9bba42`. The encoder-fix half is in; the issues below are
+either Phase 2+ of the same plan or adjacent cleanup that's now
+exposed because the file is being read more carefully than before.
+
+## Caveat 1 — `Duration` fields still emit zero-spam as raw int64
+
+**Where:** `internal/config/config.go:50, 57` —
+`SLM.StartupTimeout Duration` and `SLM.ClassifyTimeout Duration`.
+
+**Symptom:** Running `gnoma config set --global slm.enabled true`
+on a fresh global config produces:
+
+```toml
+[slm]
+  enabled = true
+  startup_timeout = 0
+  classify_timeout = 0
+```
+
+`startup_timeout = 0` and `classify_timeout = 0` are emitted even
+with `,omitempty` on the struct tags. The `Duration` type only has
+`UnmarshalText` (`config.go:393`) — no `MarshalText` — so
+BurntSushi falls back to encoding the underlying `int64` nanosecond
+value, and `omitempty` doesn't apply to the custom type at the
+field level.
+
+**Why it's pre-existing:** The original `setConfig` predates the
+`omitempty` work in Phase 1. The encoder always wrote the full
+struct, so the Duration-as-int64 behavior was always there but
+masked by the surrounding zero-spam from other fields.
+
+**Severity:** Cosmetic. `0` is the documented "use built-in
+default" sentinel for both fields — `defaultClassifyTimeout = 15s`
+in `internal/slm/classifier.go:23` and the llamafile startup
+timeout defaults to 5s. So the file's `0` values are semantically
+equivalent to absent; the resolver passes them through unchanged.
+
+**Fix (small PR, ~30 lines):**
+
+Convert the two Duration fields to `*Duration` (pointer), matching
+the seven fields already converted in Phase 1. nil = "use
+default"; `*Duration(0)` = "explicit zero". The
+`ResolvedSLMSection` mirror already needs adding in this PR
+(since the SLM section is currently un-mirrored — Phase 1 only
+mirrored Provider / Tools / Security / Router / Session / Hooks
+because those were the sections with pointer-converted fields).
+
+Implementation steps:
+
+1. `SLM.StartupTimeout *Duration` and `SLM.ClassifyTimeout *Duration`
+   in `internal/config/config.go`.
+2. `Defaults()` populates them with the documented defaults
+   (`5s` and `0s` respectively — note the `*Duration(0)` for
+   ClassifyTimeout is intentional: 0 means "let the SLM layer
+   pick its own 15s default", per the existing field comment).
+3. Add `ResolvedSLMSection` to `internal/config/resolve.go`. Update
+   `ResolvedConfig` to include it. Hook all existing SLM readers
+   (cmd/gnoma/main.go:865-870, 884, 1525, 1554-1561, 1617-1657;
+   internal/tui/app.go:245) through the mirror.
+4. Test: `TestSetGlobalConfig_DurationFieldOmitsAtZero` — set
+   `slm.enabled = true`, assert the file does NOT contain
+   `startup_timeout` or `classify_timeout`.
+5. Update `internal/config/config_test.go:454-499` (the three
+   `TestSLMSection_RegisterAsArm_*` tests) to keep working with
+   the new pointer types — they're load-side tests and just need
+   nil-or-deref assertions.
+
+Risk: low. The SLM section is read in many places, but the
+`Defaults()` baseline is updated at the same time so the
+*resolved* values are byte-identical to today's behavior.
+
+## Caveat 2 — Pre-existing zero-spam is not auto-cleaned
+
+**Where:** Any user config file that was written by a `gnoma`
+release predating `a9bba42`. The 2026-05-24 symptom was the
+project file containing `[router] prefer = ""` after an earlier
+`gnoma config set ...` call.
+
+**Phase 1 behavior:** `setConfig` continues to round-trip the
+file: read existing → decode overlays the struct → apply one
+change → write back. The `,omitempty` tags mean a field that was
+*absent* from the source is not emitted. A field that was
+*present-but-zero* in the source is still re-emitted as zero
+(the decoder sees it, the encoder writes it back).
+
+**User's recovery path today:** Re-set the affected key, e.g.
+`gnoma config set router.prefer cloud`. The decoder reads
+`prefer = ""` into the struct, the setter overwrites it with
+`"cloud"`, the encoder writes `prefer = "cloud"`. The zero-spam
+is gone — for that field, on that file. Other zero-spam in the
+same file stays until the user re-sets each affected key
+individually.
+
+**Why this isn't in Phase 1:** the alternative — "drop fields
+whose value equals the default" — is a *read-modify-write* of the
+existing file that needs to know which keys were present in the
+source. BurntSushi's encoder doesn't expose that; the plan defers
+it to `gnoma upgrade-config` (Phase 4).
+
+**Fix (the Phase 4 plan, ~200 lines):** `gnoma upgrade-config`
+with per-file backup, diff output, and `--all-projects` mode.
+Out of scope for this follow-up doc; lives in the original
+[`2026-05-24-config-migration.md` Phase 4 section](2026-05-24-config-migration.md#phase-4--gnoma-upgrade-config).
+
+**What this caveat doc *does* add:** a one-line README note under
+the config section flagging that pre-`a9bba42` config files may
+have accumulated zero-spam, and pointing at `gnoma upgrade-config`
+as the cleanup tool once it ships.
+
+## Caveat 3 — `BanditSection` keeps the 0-sentinel pattern
+
+**Where:** `internal/config/config.go:194-215` — QualityAlpha,
+MinObservations, ObservedWeight, StrengthBonus.
+
+**Status:** intentional, kept as-is per the Phase 1 plan. The
+doc comments on each field document 0 as "use default" and the
+consumers (`internal/router/feedback.go`, `selector.go`) already
+handle 0-sentinel values. Pointer conversion would force every
+reader to deref for a knob that nobody sets by hand.
+
+**Fix:** none planned. The risk if anyone ever does set these
+explicitly to 0 (intending "off" or "no effect") is the same
+silent-shadowing pattern Phase 1 fixed elsewhere — but the
+comment-documented 0-sentinel is a deliberate contract here.
+Documented so the next person reviewing the code doesn't try to
+"fix" it.
+
+## Ordering and dependencies
+
+| # | Item | Depends on | Estimated size |
+|---|---|---|---|
+| 1 | Duration pointer conversion | nothing | 1 PR, ~30 lines |
+| 2 | `gnoma upgrade-config` (Phase 4) | nothing | 1 PR, ~200 lines |
+| 3 | `gnoma doctor` (Phase 3) | Project registry (Phase 2) | 1 PR, ~250 lines |
+| 4 | Project registry (Phase 2) | nothing | 1 PR, ~150 lines |
+| 5 | Auto-migration (Phase 5) | Phases 1-4 in production | deferred one release |
+
+Phase 2 (registry) and Phase 3 (doctor) are independent of the
+Duration fix and of `upgrade-config`, but doctor without a
+registry has to fall back to a filesystem scan which is slow on
+big machines. Land registry first.
+
+## Not in this doc
+
+- Sensitive-content policy (separate plan:
+  [`2026-05-24-sensitive-content-policy.md`](2026-05-24-sensitive-content-policy.md))
+- Egress allowlist (separate plan:
+  [`2026-06-04-egress-allowlist.md`](2026-06-04-egress-allowlist.md))
+- MiniMax provider (separate plan:
+  [`2026-06-04-minimax-provider.md`](2026-06-04-minimax-provider.md))
+- ACP (separate plan:
+  [`2026-06-04-agent-client-protocol.md`](2026-06-04-agent-client-protocol.md))
@@ -0,0 +1,198 @@
+# Cross-Platform Support (Windows + macOS) — 2026-06-04
+
+Makes the Windows and macOS binaries — which GoReleaser already builds
+for `linux/darwin/windows × amd64/arm64` but only Linux exercises —
+actually work and stay working. Promotes the TODO.md entry
+"Cross-platform support — Windows + macOS" into a phased design with
+concrete code touch-points.
+
+This plan does not restate the TODO's r/devops question map (Phase 2
+table there stands). Its value-add is the **specific code locations**
+that need OS-conditional handling and the build-tag pattern to use.
+
+---
+
+## Problem
+
+Only Linux is tested. The binaries ship for Windows/macOS untested, and
+the codebase has several hard Unix assumptions that will fail or
+silently misbehave off-Linux. The pattern to follow already exists:
+`internal/mcp/transport_{unix,windows}.go` split via build tags.
+
+---
+
+## Non-goals
+
+- **MSI installer, Authenticode/Gatekeeper signing.** Covered by
+  `2026-06-04-distribution-followups.md` — those are packaging, not
+  runtime correctness.
+- **Group Policy / Event Viewer integration.** Out of scope per the
+  TODO; documentation-only.
+- **WSL-specific tuning.** WSL is Linux; it works today.
+
+---
+
+## Confirmed Unix-assumption defects (file:line)
+
+### Critical — break core functionality on Windows
+
+1. **Bash tool hardcodes `bash -c`.**
+   `internal/tool/bash/bash.go:117` →
+   `exec.CommandContext(ctx, "bash", "-c", command)`. No Windows shell.
+   Alias harvesting (`internal/tool/bash/aliases.go:115,148`) hardcodes
+   `/bin/bash` and splits the shell path on `/`.
+2. **Llamafile SLM startup hardcodes `sh`.**
+   `internal/slm/manager.go:172` invokes `sh <llamafile>` (a Wine
+   binfmt workaround). `sh` is absent on native Windows → `gnoma slm
+   status/setup` fails outright.
+3. **MCP process-tree kill is a Windows stub.**
+   `internal/mcp/transport_windows.go:10-18` — `setProcessGroup` is a
+   no-op and `killProcessTree` calls `p.Kill()`, leaking any child
+   processes an MCP server spawns. Unix version uses process groups
+   (`transport_unix.go:11-18`).
+
+### High — config/auth land in the wrong place off-Linux
+
+4. **Config/data dirs assume XDG.**
+   `internal/config/load.go:52-59` falls back to `~/.config`;
+   `internal/slm/manager.go:25-35` falls back to `~/.local/share`. On
+   Windows these should be `os.UserConfigDir()` (`%AppData%`) /
+   `os.UserCacheDir()`. On macOS, native tools use
+   `~/Library/Application Support`, though `~/.config` is tolerable;
+   decide and document.
+5. **OAuth credential discovery is Unix-pathed.**
+   `internal/provider/google/provider.go:188-204` hardcodes
+   `~/.config/...` and `~/.gemini/...`. `expandHome` (`:114-129`)
+   already handles `\`, but the path *set* is Unix-centric — Gemini/
+   Antigravity creds on macOS/Windows won't be found.
+6. **No system-proxy support.** No `http.ProxyFromEnvironment` wiring
+   found. Go stdlib reads `HTTP(S)_PROXY` env vars but **not** the
+   Windows system proxy / PAC. Corporate Windows networks rely on these.
+
+### Medium — usability / safety classifier gaps
+
+7. **`internal/safety/cwd.go`** macOS system roots
+   (`:185-210`) miss `/opt`, `/usr/local`; personal-dir detection
+   (`:221-252`) misses Windows `%TEMP%`/`%APPDATA%` and macOS
+   `~/Library/...`.
+8. **Terminal/ANSI.** TUI uses lipgloss/termenv (auto-detects), so
+   modern Windows Terminal/PowerShell 7 are fine; legacy `conhost.exe`
+   may mangle. Verify, don't assume.
+
+---
+
+## Design
+
+### Phase 0 — build-tag scaffolding
+
+Adopt the existing `_unix.go` / `_windows.go` split (as in
+`internal/mcp`) for each defect that needs divergent behaviour. Prefer
+`runtime.GOOS` only for small inline branches (as
+`internal/safety/cwd.go:201` already does); use build tags when the
+implementation genuinely differs (shell selection, process kill).
+
+### Phase 1 — smoke tests (unblocks the honest "did you test it?" answer)
+
+Non-blocking GitHub Actions matrix (`windows-latest`, `macos-latest`,
+`ubuntu-latest`):
+
+- `go build ./...` and `go test ./...` per OS (today the release
+  workflow tests Linux only — `.github/workflows/release.yml`).
+- Post-release: download each archive, run `gnoma --version` and a
+  stubbed `echo hi | gnoma --provider ollama` against a fake endpoint.
+  Confirms the binary launches and the TUI doesn't crash.
+
+This is the precondition the TODO names for posting to r/devops.
+
+### Phase 2 — shell abstraction (defects #1, #2)
+
+1. Introduce `internal/tool/bash/shell_unix.go` /
+   `shell_windows.go` exposing `defaultShell() (name string, args
+   []string)` and a `quoteArg(string) string`:
+   - Unix: `bash`/`$SHELL`, `-c`, POSIX quoting.
+   - Windows: prefer `pwsh`/`powershell` with the appropriate
+     `-Command` invocation and PowerShell quoting rules; fall back to
+     `cmd /c`. Document the choice.
+2. Fix `aliases.go` to use `filepath.Base` instead of splitting on `/`,
+   and skip alias harvesting on Windows shells that have no equivalent.
+3. Llamafile: on Windows, invoke the `.llamafile` (which is a valid
+   Windows PE as well as a shell script) directly rather than via `sh`;
+   guard with a build tag.
+
+### Phase 3 — process management (defect #3)
+
+Implement Windows job objects via `golang.org/x/sys/windows` in
+`transport_windows.go` (and any other subprocess owner —
+`internal/provider/subprocess`, `internal/tool/bash`): create a job,
+assign the child, `TerminateJobObject` on close to reap the whole tree.
+Shared helper so MCP and bash tool both get tree-kill. (This is the
+same item the distribution TODO references.)
+
+### Phase 4 — paths + proxy (defects #4, #5, #6)
+
+1. Replace XDG fallbacks with `os.UserConfigDir()` / `os.UserCacheDir()`
+   on Windows (keep XDG honoring on Unix). Centralise in one
+   `configDir()` / `dataDir()` helper so it's not re-derived.
+2. Extend the OAuth credential path sets with OS-appropriate locations
+   (macOS `~/Library/Application Support/...`, Windows `%AppData%/...`).
+3. Ensure every `http.Client` uses a transport with
+   `Proxy: http.ProxyFromEnvironment`. For Windows system-proxy/PAC,
+   document the env-var workaround now; optionally vendor a PAC-aware
+   transport (e.g. `github.com/rapid7/go-get-proxied`) later. This
+   overlaps the shared-client work in
+   `2026-06-04-egress-allowlist.md` — do the proxy transport once, in
+   the shared client.
+
+### Phase 5 — safety classifier + terminal (defects #7, #8)
+
+Extend `internal/safety/cwd.go` system-root and personal-dir sets per
+OS; add a manual verification note for legacy Windows terminals.
+
+---
+
+## Touch-points (file:line)
+
+| Defect | Location |
+|---|---|
+| Bash shell | `internal/tool/bash/bash.go:117`, `aliases.go:115,148` |
+| Llamafile `sh` | `internal/slm/manager.go:172` |
+| MCP kill stub | `internal/mcp/transport_windows.go:10-18` |
+| Config/data dirs | `internal/config/load.go:52-59`, `internal/slm/manager.go:25-35` |
+| OAuth paths | `internal/provider/google/provider.go:188-204` |
+| Proxy | shared `http.Client` (see egress plan) |
+| Safety classifier | `internal/safety/cwd.go:185-252` |
+| CI matrix | `.github/workflows/` (new test job), `release.yml` |
+
+---
+
+## Testing (TDD — write first)
+
+- **OS-gated unit tests** (run on each matrix OS):
+  - `defaultShell()` returns a runnable shell per OS; `quoteArg`
+    round-trips a value containing spaces/quotes through the real shell.
+  - `configDir()`/`dataDir()` return the OS-correct base.
+  - Job-object kill: spawn a child that spawns a grandchild; assert
+    both are gone after `killProcessTree` (Windows).
+  - `safety.ClassifyCWD` flags OS-appropriate system/personal dirs.
+- **Existing tests** that `t.Skip` on Windows
+  (`internal/tool/fs/guard_test.go`,
+  `internal/provider/subprocess/stream_test.go`) — audit whether the
+  skip hides a real gap now that Windows is a target.
+
+### Acceptance criteria
+
+1. CI smoke matrix is green on `windows-latest` + `macos-latest`.
+2. `gnoma --version` and a stubbed pipe run succeed on a Windows runner.
+3. A bash-tool command with quoted args runs on Windows (PowerShell).
+4. An MCP server that spawns a child leaves no orphan after shutdown on
+   Windows.
+5. Config lands in `%AppData%\gnoma` on Windows, `~/.config/gnoma` on
+   Linux.
+
+---
+
+## TODO linkage
+
+Promotes the "Cross-platform support — Windows + macOS" entry in
+`TODO.md`. The Phase-2 r/devops question table stays in the TODO as the
+public-facing answer map; link this plan for the implementation detail.
@@ -0,0 +1,169 @@
+# Distribution Follow-ups — 2026-06-04
+
+Hardens and broadens the release pipeline. v0.1.0+ already ships static
+archives (GitHub mirror releases) and multi-arch Docker images (GHCR)
+via GoReleaser. This plan covers the optional follow-ups listed under
+"Distribution — follow-ups" in TODO.md: signed checksums, Homebrew tap,
+`curl | sh` installer, release-note automation, and the
+`dockers`→`dockers_v2` migration.
+
+---
+
+## Current state (confirmed)
+
+- **`.goreleaser.yml`:** 6-target build matrix (linux/darwin/windows ×
+  amd64/arm64), CGO disabled, version injected via ldflags
+  (`-X main.buildVersion/buildCommit/buildDate`; read at
+  `cmd/gnoma/main.go:55-60`, printed at `:95-98`). Archives: tar.gz
+  (zip on Windows). Checksums: plain SHA256 `checksums.txt`,
+  **unsigned**. Docker: separate per-arch `dockers` blocks +
+  `docker_manifests` for the multi-arch manifest. Release published to
+  GitHub mirror (`release.github` owner `VikingOwl91`).
+- **`.github/workflows/release.yml`:** triggers on `v*` tags, sets up
+  QEMU + Buildx, logs into GHCR with the built-in `GITHUB_TOKEN`, runs
+  `go test ./...` (Linux only), then `goreleaser release --clean` with
+  `GORELEASER_CURRENT_TAG` set. **No signing step.**
+- **`Dockerfile`:** distroless `static:nonroot`, copies the
+  GoReleaser-built binary in. Architecture-agnostic (binary built
+  before `COPY`).
+- **No** Homebrew tap, install script, or Makefile release target.
+
+---
+
+## Non-goals
+
+- **Authenticode (Windows) / Gatekeeper notarization (macOS) code
+  signing.** These need a paid EV cert / Apple Developer account —
+  tracked separately (the cross-platform TODO documents the
+  "right-click → Unblock" workaround). Sigstore/cosign here is for
+  *checksum* signing, which needs no paid cert.
+- **MSI installer.** Lives in the cross-platform plan, gated on demand.
+- **Changing the canonical repo flow.** PRs still go to the Gitea
+  upstream; the GitHub mirror remains the release/CI surface.
+
+---
+
+## Design (independent work items — ship in any order)
+
+### 1. Signed checksums (cosign / sigstore keyless)
+
+Add a GoReleaser `signs` block that signs `checksums.txt` with cosign
+in **keyless** mode (OIDC via the GitHub Actions token — no stored
+private key, no cert cost):
+
+- Add `cosign` install + `id-token: write` permission to
+  `release.yml`.
+- GoReleaser `signs:` → `cmd: cosign`, `args: sign-blob` producing
+  `checksums.txt.sig` + `.pem` (cert bundle) as release artifacts.
+- Document verification:
+  `cosign verify-blob --certificate ... --signature ... checksums.txt`.
+
+Acceptance: a downloaded release verifies offline against the published
+signature + Rekor transparency log.
+
+### 2. Homebrew tap
+
+Create a tap repo (`VikingOwl91/homebrew-tap`) and add GoReleaser's
+`brews:` block targeting it. Needs a PAT with `contents:write` on the
+tap repo (the default `GITHUB_TOKEN` can't push to a *second* repo) —
+store as `HOMEBREW_TAP_TOKEN` secret. Formula installs the darwin/linux
+archives.
+
+Acceptance: `brew install vikingowl91/tap/gnoma` installs a working
+binary on macOS + Linuxbrew; `gnoma --version` matches the tag.
+
+### 3. `curl | sh` installer
+
+Add `install.sh` (committed at repo root, served via the raw GitHub
+mirror) that:
+
+- Detects OS/arch, maps to the GoReleaser archive name template
+  (`gnoma_<ver>_<os>_<arch>.<ext>`).
+- Resolves the latest release via the GitHub API (or honours a pinned
+  `GNOMA_VERSION`).
+- Downloads the archive **and** `checksums.txt`, verifies the SHA256
+  before extracting (and the cosign signature if cosign is present).
+- Installs to `~/.local/bin` (or `$GNOMA_INSTALL_DIR`), prints a PATH
+  hint.
+
+Keep it POSIX-sh, no bashisms. Acceptance:
+`curl -fsSL <raw>/install.sh | sh` yields a runnable `gnoma` on a clean
+Linux + macOS box; checksum mismatch aborts.
+
+### 4. Release-note automation
+
+GoReleaser already generates a filtered changelog (excludes
+docs/test/chore/style). Enrich it:
+
+- Group commits by Conventional-Commit type
+  (`changelog.groups` with title regexes for feat/fix/perf/refactor).
+- Add a release header template pointing to the upstream Gitea repo and
+  the install methods (brew / curl | sh / docker).
+
+Acceptance: a tagged release's GitHub notes show grouped sections + an
+install snippet, with no docs/chore noise.
+
+### 5. `dockers` → `dockers_v2` migration
+
+Collapse the two per-arch `dockers` blocks + `docker_manifests` into a
+single `dockers_v2` block (GoReleaser's newer multi-platform builder).
+The current `Dockerfile` is architecture-agnostic (binary copied
+post-build), so verify whether `dockers_v2`'s expected per-platform
+binary layout needs a `Dockerfile` change or a `templates`/`extra_files`
+tweak — the TODO flags this as the reason it was deferred. Do it in its
+own commit; diff the resulting GHCR manifest against the current one to
+prove parity (same tags: `<ver>-amd64`, `<ver>-arm64`, `<ver>`,
+`latest`).
+
+Acceptance: GHCR still publishes a multi-arch manifest with identical
+tags + labels; `docker pull --platform linux/arm64` works.
+
+### 6. (Carry-over) Windows process-tree kill
+
+Listed in this TODO bullet but it's a *runtime* concern — implemented in
+`2026-06-04-cross-platform.md` Phase 3 (job objects). Cross-linked here
+only so the TODO bullet's reference resolves.
+
+---
+
+## Touch-points (file:line)
+
+| Item | Location |
+|---|---|
+| Signing, brews, changelog groups, dockers_v2 | `.goreleaser.yml` |
+| cosign install, `id-token` perm, tap token | `.github/workflows/release.yml` |
+| Installer | new `install.sh` (repo root) |
+| Dockerfile (if dockers_v2 needs it) | `Dockerfile` |
+| Tap repo | new `VikingOwl91/homebrew-tap` |
+
+---
+
+## Testing
+
+Distribution is config + scripts, so testing is mostly pipeline-level:
+
+- **Dry run:** `goreleaser release --snapshot --clean` locally must
+  produce signed checksums, brew formula, and the dockers_v2 manifest
+  without publishing.
+- **install.sh:** a `shellcheck` gate + a CI job that runs it against
+  the latest release on linux + macos runners and asserts
+  `gnoma --version`.
+- **Checksum/signature negative test:** corrupt the archive → installer
+  aborts; tampered checksums → cosign verify fails.
+
+### Acceptance criteria
+
+1. A tagged release publishes `checksums.txt` + `.sig` + `.pem`,
+   verifiable with cosign keyless.
+2. `brew install vikingowl91/tap/gnoma` works on macOS.
+3. `curl -fsSL <raw>/install.sh | sh` works on clean Linux + macOS,
+   with checksum verification.
+4. Release notes are grouped and carry install instructions.
+5. GHCR multi-arch manifest is unchanged after the dockers_v2 swap.
+
+---
+
+## TODO linkage
+
+Promotes the "Distribution — follow-ups" entry in `TODO.md`. Link this
+file; the Windows job-object sub-item points at the cross-platform plan.
@@ -0,0 +1,236 @@
+# Network Egress Allowlist — 2026-06-04
+
+Adds a per-host network egress boundary to the security layer via a
+Learn → Review → Enforce rollout. Promotes the second half of the
+TODO.md entry "Security boundary — egress controls + session audit log"
+into a phased design.
+
+---
+
+## Status of the sibling item: per-session audit log — DONE
+
+The first half of the TODO entry (per-session audit log of
+blocked/redacted events) is **already implemented**:
+
+- `internal/security/audit.go` defines `AuditLogger` / `AuditEvent`,
+  writing append-only JSONL at mode `0o600`, incognito-gated,
+  best-effort (write failures never break the scan pipeline).
+- `cmd/gnoma/main.go:685-691` wires it to
+  `<projectRoot>/.gnoma/sessions/<sessionID>/audit.jsonl`.
+- `internal/security/firewall.go` records events at `:152` (unicode
+  sanitize), `:173` (block), `:186` (redact).
+
+**Remaining audit-log gap:** there is no CLI surface to *read* it. The
+TODO's promise — answer "what did the firewall do this session?" in one
+command — needs a `gnoma firewall audit` subcommand (no `firewall`
+subcommand exists today; top-level commands are `providers`, `slm`,
+`router`, `profile`). That viewer is folded into Phase 3 below since it
+shares the `gnoma firewall` command surface with `firewall review`.
+
+The rest of this plan is the genuinely-unbuilt egress allowlist.
+
+---
+
+## Problem
+
+The current `Firewall` is a **content** boundary only: it scans
+messages and tool results for secrets (regex + Shannon entropy) and
+redacts/blocks/warns. It does **not** enforce network egress. Outgoing
+HTTP uses stock clients with no per-host allowlist and no dial-layer
+interception, so a compromised tool, MCP server, or prompt-injected
+provider call can reach any host.
+
+The README and v0.3.0 launch post oversold "network egress gated";
+this plan makes that claim true.
+
+### Why this is hard: no egress chokepoint today
+
+Outgoing HTTP is constructed in many places, none sharing a client:
+
+- **Provider SDKs** each build their own `http.Client` internally:
+  - anthropic (`internal/provider/anthropic/provider.go:36`,
+    `anthropic.NewClient`)
+  - openai (`internal/provider/openai/provider.go:46`, `oai.NewClient`)
+  - mistral (`internal/provider/mistral/provider.go:33`,
+    `mistralgo.NewClient`)
+  - google genai (`internal/provider/google/provider.go:239,306`)
+- **Non-SDK direct calls** using `http.DefaultClient` or ad-hoc
+  `&http.Client{}`:
+  - `internal/router/discovery.go` (`:65,141,325,365`)
+  - `internal/router/probe.go` (`:24,72`)
+  - `internal/slm/backend.go` (`:266,294,316,343`)
+  - `internal/slm/download.go` (`:22`)
+  - `internal/slm/manager.go` (`:273`)
+
+No custom `http.Client` is injected anywhere today. **But** every SDK
+supports injecting one, which is the enabler for a single chokepoint.
+
+---
+
+## Non-goals
+
+- **TLS interception / MITM.** We allowlist by destination host, not by
+  inspecting decrypted payloads. Content inspection stays the
+  firewall's job.
+- **Blocking the provider SDKs' own retry/telemetry hosts by default.**
+  Model-provider hosts are baseline-allowed (see below).
+- **Replacing the OS/network firewall.** This is an in-process
+  application-level guard, defense-in-depth, not a substitute for real
+  network controls. Document this honestly (the README over-claim is
+  the cautionary tale).
+
+---
+
+## Design
+
+### The chokepoint: one shared `http.Client` with a guarded dialer
+
+Build a single `*http.Client` whose `Transport.DialContext` validates
+the destination against the allowlist **before** the connection is
+made. `DialContext` receives `host:port` pre-resolution, so host-based
+matching works without DNS races. Thread this client everywhere.
+
+```
+internal/security/egress/
+  guard.go      // EgressGuard: mode + allowlist + Decide(host) ResultEnum
+  dialer.go     // GuardedDialer wrapping net.Dialer.DialContext
+  client.go     // HTTPClient(guard) *http.Client
+  store.go      // learned-destinations persistence (per project)
+  baseline.go   // curated ship-in-binary allowlist
+```
+
+**Injection mechanism per SDK** (each differs — enumerate, don't assume):
+
+| Client | Mechanism |
+|---|---|
+| anthropic | `option.WithHTTPClient(c)` appended in `anthropic/provider.go` |
+| openai | `option.WithHTTPClient(c)` appended in `openai/provider.go` |
+| google genai | `genai.ClientConfig{HTTPClient: c}` in `google/provider.go` |
+| mistral | **user's own SDK** — add `WithHTTPClient` option if absent (`github.com/VikingOwl91/mistral-go-sdk`), then use it |
+| non-SDK paths | replace `http.DefaultClient` with the shared client in `router/discovery.go`, `router/probe.go`, `slm/backend.go`, `slm/download.go`, `slm/manager.go` |
+
+Plumb the shared client into providers by adding
+`HTTPClient *http.Client` to `provider.ProviderConfig`
+(`internal/provider/registry.go:8-16`) and setting it in
+`createProvider`. The non-SDK paths take the client via their existing
+constructors / a package-level setter.
+
+> The non-SDK paths are the trap: if any is missed it punches a hole in
+> the allowlist. Treat the list above as a checklist; add a grep test
+> (Phase 4) that fails if `http.DefaultClient` reappears.
+
+### Three-stage rollout (not a single "block everything" default)
+
+**Learn.** First runs log every egress destination per `(project,
+agent, tool)` tuple to the per-project store **without blocking**.
+Reuse the audit JSONL discipline (atomic, incognito-gated).
+
+**Review.** `gnoma firewall review` surfaces the captured set; the user
+marks each destination `allow | deny | scoped` (scoped = only reachable
+by named tool/agent). Persist to `.gnoma/firewall/allowlist.toml`
+(project) — subject to the same `omitempty`/atomic-write discipline as
+the config-migration plan (`2026-05-24-config-migration.md`) to avoid
+the zero-spam corruption class.
+
+**Enforce.** When mode is `enforce`, unrecognised destinations are
+blocked with a clear violation logged to the **same per-session
+`audit.jsonl`** (new `AuditEvent.Action = "egress_block"`). Mode is
+`[security.egress].mode = "off" | "learn" | "enforce"`, default `off`
+(opt-in; shipping `enforce` on by default would break first-run UX).
+
+### Baseline allowlist (curated, ship-in-binary)
+
+`baseline.go` seeds the allowlist so Enforce mode is usable immediately:
+
+- **Package ecosystems:** github.com, registry.npmjs.org, pypi.org,
+  files.pythonhosted.org, crates.io, static.crates.io,
+  registry-1.docker.io, proxy.golang.org, sum.golang.org.
+- **Model providers:** anthropic, openai, google, mistral, **minimax**
+  (per `2026-06-04-minimax-provider.md`) — host set derived from the
+  effective `[provider.endpoints]` map so user-configured local
+  ollama/llamacpp endpoints are auto-allowed.
+
+The painful middle ground is SDK egress (sentry, stripe, supabase,
+datadog…). These break a naive "block unknown" default, which is
+exactly why Learn → Review → Enforce is the only flow that scales.
+
+### Per-tool scoping
+
+`scoped` destinations carry an allowed-tool/agent set. Enforcement
+checks the calling context — the engine already knows which tool is
+running (it threads per-tool context for redaction logging today). Pass
+the tool/agent identity into `EgressGuard.Decide(host, callerCtx)`.
+
+---
+
+## Interactions
+
+- **Incognito:** Learn-mode writes are gated by incognito exactly like
+  the audit log (`IncognitoMode.ShouldLogContent`). Enforcement still
+  applies in incognito (security is not relaxed); only the *learning*
+  persistence is suppressed.
+- **Config layering:** the allowlist file is a new corruption surface —
+  follow `2026-05-24-config-migration.md` #1 discipline.
+- **SafeProvider:** egress is orthogonal to the content `SafeProvider`
+  wrap; it lives one layer down at the transport. Both must hold.
+
+---
+
+## Touch-points (file:line)
+
+| Change | Location |
+|---|---|
+| New egress package | `internal/security/egress/` |
+| `HTTPClient` field | `internal/provider/registry.go:8-16` |
+| Provider client injection | `anthropic/provider.go`, `openai/provider.go`, `google/provider.go`, `mistral/provider.go` |
+| mistral SDK `WithHTTPClient` | `github.com/VikingOwl91/mistral-go-sdk` (if absent) |
+| Non-SDK client swap | `router/discovery.go`, `router/probe.go`, `slm/backend.go`, `slm/download.go`, `slm/manager.go` |
+| `audit.go` egress action | `internal/security/audit.go` (`AuditEvent`) |
+| Config `[security.egress]` | `internal/config/config.go` (SecuritySection ~`:280-306`) |
+| `gnoma firewall` command | `cmd/gnoma/main.go` subcommand dispatch (~`:178`) |
+| Allowlist store | `.gnoma/firewall/allowlist.toml` |
+
+---
+
+## Testing (TDD — write first)
+
+- **Unit:**
+  - `EgressGuard.Decide`: off → always allow; learn → allow + record;
+    enforce → allow baseline/allowlisted, block unknown, scoped host
+    allowed only for the named tool.
+  - `GuardedDialer` blocks a non-allowlisted `host:port` before dial
+    (use a guard with a closed allowlist; assert no connection
+    attempt — inject a fake inner dialer that records calls).
+  - Baseline expansion: `[provider.endpoints]` hosts are auto-allowed;
+    a local ollama URL becomes an allowlist entry.
+  - Allowlist store round-trips without zero-spam corruption.
+  - `audit.jsonl` gains an `egress_block` record on a blocked dial.
+- **Grep/guard test:** fails if `http.DefaultClient` is used in
+  provider/router/slm packages (prevents regressions reopening the
+  hole).
+- **Integration (`//go:build integration`):** with mode=enforce and a
+  minimal allowlist, a provider call to an allowed host succeeds and a
+  tool fetch to a blocked host fails with a logged violation.
+
+### Acceptance criteria
+
+1. `mode="off"` (default) → behaviour identical to today.
+2. `mode="learn"` → every outbound host appears in the store; nothing
+   is blocked.
+3. `gnoma firewall review` lists learned hosts and persists
+   allow/deny/scoped decisions.
+4. `mode="enforce"` → baseline + allowlisted hosts reachable; an
+   un-allowlisted host is blocked with an `egress_block` line in
+   `.gnoma/sessions/<id>/audit.jsonl`.
+5. `gnoma firewall audit` prints this session's firewall events
+   (block/redact/egress) in a grep-friendly form. (Closes the
+   remaining audit-log gap.)
+6. Scoped destination reachable by its named tool only.
+
+---
+
+## TODO linkage
+
+Replaces the egress half of the "Security boundary — egress controls +
+session audit log" entry in `TODO.md`. Update that entry to mark the
+audit log implemented and link this file for the egress work.
@@ -0,0 +1,224 @@
+# MiniMax Provider — 2026-06-04
+
+Adds MiniMax (<https://platform.minimax.io>) as a first-class cloud
+provider so it can register as a router arm alongside
+anthropic/openai/google/mistral. Promotes the TODO.md entry
+"MiniMax provider — cloud arm + subscription token plan" out of
+bullet form into a phased design.
+
+---
+
+## Problem
+
+Gnoma has no MiniMax adapter. MiniMax ships strong, very cheap coding
+models (M2 family) that are a natural fit for the cheap-high-capability
+cloud tier the router already reasons about via `CostWeight`. Two facts
+make the integration cheap:
+
+1. MiniMax exposes **both** an OpenAI-compatible and an
+   Anthropic-compatible HTTP surface, so no new translation layer is
+   needed — gnoma already has both `internal/provider/openaicompat`
+   (built on the OpenAI SDK) and `internal/provider/anthropic` with a
+   working `BaseURL` override.
+2. `envKeyFor`'s default branch (`cmd/gnoma/main.go:1199-1200`) already
+   resolves `MINIMAX_API_KEY` for any unknown provider with no code
+   change.
+
+The remaining work is wiring (a constructor + switch cases +
+enumerations), routing metadata (family defaults, rate limits), and a
+**design decision around the subscription billing model** that the
+router's metered-cost assumption does not currently handle.
+
+### External facts (VERIFY at implementation — MiniMax docs move fast)
+
+These were confirmed 2026-06-04 but the model lineup and pricing are
+revised frequently (a pricing overhaul landed 2026-06-02). Re-verify
+against the live docs before hardcoding anything:
+
+- **OpenAI-compatible base URL:** `https://api.minimax.io/v1`
+  (international). A separate region endpoint exists
+  (`api.minimaxi.com`); confirm the exact host + whether gnoma should
+  expose a region toggle. Docs:
+  <https://platform.minimax.io/docs/api-reference/text-openai-api>
+- **Anthropic-compatible endpoint:** exists ("two equivalent
+  endpoints, one mimics OpenAI, one mimics Anthropic"). Confirm the
+  exact path/host before choosing it over OpenAI-compat.
+- **Models (do NOT hardcode a single ID):** MiniMax-M2, M2.1, M2.5,
+  M2.7 (+ `-highspeed` variants), M3. Coding-relevant default is the
+  current M2-coding model — at time of writing M2.5 for PAYG, M2.1 for
+  the subscription plan. **Treat the default as config, not a
+  constant**, and call `Models(ctx)` to enumerate live.
+- **Pricing (PAYG, for `CostPer1k*` metadata):** M2.7 ≈ $0.30 / MTok
+  input, $1.20 / MTok output; highspeed ≈ 2×. Convert to the EUR
+  per-1k convention used by the Arm struct. Docs:
+  <https://platform.minimax.io/docs/guides/pricing-token-plan>
+- **Subscription:** "Token Plan" (current; supersedes the former
+  "Coding Plan"). Flat-rate prompt quota over a rolling window
+  (published M2.7 limits 1,500–30,000 requests / 5h across tiers).
+  Same Bearer key as PAYG.
+
+---
+
+## Non-goals
+
+- **A bespoke MiniMax SDK / translation layer.** We reuse the existing
+  OpenAI-compat (default) or Anthropic provider via `BaseURL`. If
+  MiniMax adds non-standard body fields, use the existing
+  `openai.NewWithStreamOptions` escape hatch (the same one Ollama uses).
+- **Region auto-detection.** Ship the international endpoint as the
+  default; the user can override via `[provider.endpoints]`. A region
+  toggle is a follow-up if anyone asks.
+- **Full subscription-quota accounting.** Phase 2 models subscription
+  cost as a coarse `CostWeight` zero-out, not a live quota meter.
+
+---
+
+## Decision: OpenAI-compat vs Anthropic-compat backing
+
+**Default to OpenAI-compat** (`internal/provider/openaicompat`). It is
+already exercised by the local backends (ollama/llamacpp), so the
+streaming, tool-call, and error paths are battle-tested in this repo.
+The Anthropic-compat endpoint is a fallback only if a MiniMax feature
+(e.g. extended thinking) is exposed solely through it. Keep the option
+open by making the backing selectable via config
+(`[provider.minimax].api = "openai" | "anthropic"`), defaulting to
+`openai`.
+
+---
+
+## Design
+
+### Phase 1 — provider wiring (smallest shippable slice)
+
+Goal: `gnoma --provider minimax` works against PAYG with metered
+pricing, registered as a cloud arm.
+
+1. **Constructor.** Add `NewMiniMax(cfg provider.ProviderConfig)
+   (provider.Provider, error)` to
+   `internal/provider/openaicompat/provider.go`, mirroring `NewOllama`
+   / `NewLlamaCpp` (`openaicompat/provider.go:18-49`):
+   - Default `BaseURL` to `https://api.minimax.io/v1` when unset (but
+     let `[provider.endpoints].minimax` override).
+   - Require a real API key (unlike Ollama's dummy key) — return an
+     error if `cfg.APIKey == ""`.
+   - Leave `MaxRetries` at the SDK default (cloud failures *are*
+     transient, unlike the local backends which force `0`).
+   - Default `cfg.Model` to the current coding model **read from
+     config**, not a baked constant.
+
+2. **Construction switch.** Add `case "minimax": return
+   openaicompat.NewMiniMax(cfg)` to `createProvider`
+   (`cmd/gnoma/main.go:1265-1280`). If `[provider.minimax].api =
+   "anthropic"`, route to `anthropicprov.New(cfg)` with `cfg.BaseURL`
+   set to the anthropic-compat host instead.
+
+3. **Provider enumerations.** Add `"minimax"` to:
+   - the known-providers set (`main.go:233-236`),
+   - the available-providers usage string (`main.go:1279`),
+   - NOT the local-providers set (it is a cloud arm).
+
+4. **API key (optional friendliness).** `envKeyFor`'s default already
+   yields `MINIMAX_API_KEY`. Add an explicit `case "minimax"` in
+   `envKeyFor` (`main.go:1189-1201`) only if we want alternates (e.g.
+   `MINIMAX_GROUP_ID` if the account requires a group id header —
+   VERIFY whether MiniMax needs a group id alongside the key; if so,
+   thread it through `ProviderConfig.Options`).
+
+5. **Family defaults.** Add MiniMax model families to
+   `knownFamilyDefaults` in `internal/router/defaults.go` (pattern at
+   `defaults.go:212-239`). Cloud arm → no `MaxComplexity` ceiling. Set
+   `Strengths` (`TaskGeneration`, `TaskRefactor`, `TaskDebug` are the
+   coding sweet spot) and a low `CostWeight` (~0.8–1.0 — cheap arm, so
+   the cost penalty is small) plus `CostPer1kInput/Output` from the
+   verified PAYG pricing.
+
+6. **Rate limits.** Add a `minimaxDefaults()` entry in
+   `internal/provider/ratelimits.go` (pattern at the anthropic block
+   ~`ratelimits.go:109-130`) and wire it into the `DefaultRateLimits`
+   switch. Use the published PAYG RPM/TPM; allow `[rate_limits.minimax]`
+   config overrides (the existing override path in `resolveRateLimitPools`).
+
+### Phase 2 — subscription (Token Plan) billing model
+
+The router's `CostWeight` math assumes metered per-token pricing. Under
+a Token Plan subscription, marginal cost is ≈0 until the quota is hit,
+then requests hard-fail. Design:
+
+1. **Billing knob.** `[provider.minimax].billing = "metered" |
+   "subscription"` (default `"metered"`). In `subscription` mode, set
+   the arm's `CostWeight` to 0 (or `CostPer1k*` to 0) so the selector
+   treats MiniMax as free while quota remains.
+
+2. **Quota-exhaustion failover.** MiniMax returns a quota/429 error
+   when the plan is exhausted. Map it to the existing rate-limit
+   backoff path (`Arm.BackoffUntil`, the 429 handling that already
+   disables an arm temporarily) so the bandit fails over to the next
+   arm cleanly. This ties into the session error-recovery work landed
+   in `0d3d190`. Confirm the exact error shape MiniMax returns and add
+   a classifier in `internal/provider/errors.go`.
+
+3. **Docs.** Document both plans + the region split in
+   `docs/slm-backends.md` (or a new provider doc) and the README
+   provider list.
+
+---
+
+## Touch-points (file:line)
+
+| Change | Location |
+|---|---|
+| `NewMiniMax` constructor | `internal/provider/openaicompat/provider.go` (after `:49`) |
+| Construction switch case | `cmd/gnoma/main.go:1265-1280` |
+| Known-providers set | `cmd/gnoma/main.go:233-236` |
+| Usage string | `cmd/gnoma/main.go:1279` |
+| `envKeyFor` (optional) | `cmd/gnoma/main.go:1189-1201` |
+| Family defaults | `internal/router/defaults.go:212-239` |
+| Rate-limit defaults | `internal/provider/ratelimits.go` (+ `DefaultRateLimits` switch) |
+| Error classifier (Phase 2) | `internal/provider/errors.go` |
+| Config: `[provider.minimax]` | `internal/config/config.go` (provider section) |
+
+The `Provider` interface contract to satisfy
+(`internal/provider/provider.go:136-148`): `Stream`, `Name`, `Models`,
+`DefaultModel`. All four come free by delegating to the OpenAI-compat
+base provider.
+
+---
+
+## Testing (TDD — write first)
+
+Per CLAUDE.md: table-driven, `//go:build integration` for anything
+hitting the live API.
+
+- **Unit (no network):**
+  - `NewMiniMax` defaults: empty `BaseURL` → `https://api.minimax.io/v1`;
+    empty key → error; `[provider.endpoints].minimax` override wins.
+  - `createProvider("minimax", …)` returns a non-nil provider; unknown
+    still errors.
+  - `envKeyFor("minimax") == "MINIMAX_API_KEY"`.
+  - `defaults.go`: a MiniMax model family resolves to the expected
+    `Strengths`/`CostWeight`; `MaxComplexity == 0`.
+  - `ratelimits.go`: `DefaultRateLimits("minimax").LookupModel(...)`
+    returns the configured limits; `"*"` fallback works.
+  - Phase 2: billing=`subscription` → arm `CostWeight == 0`; the
+    quota/429 error maps to a retryable/backoff classification.
+- **Integration (`//go:build integration`, real `MINIMAX_API_KEY`):**
+  a one-shot `Stream` against the cheapest model returns tokens;
+  `Models(ctx)` enumerates a non-empty list.
+
+### Acceptance criteria
+
+1. `MINIMAX_API_KEY=… gnoma --provider minimax -p "hello"` streams a
+   response in pipe mode.
+2. With no `--provider`, MiniMax appears as a selectable router arm and
+   is chosen for a cheap generation task when `prefer` allows cloud.
+3. `gnoma providers` lists `minimax`.
+4. Phase 2: with `billing="subscription"`, the selector prefers MiniMax
+   for eligible tasks; on simulated quota-exhaustion the router fails
+   over without surfacing an error to the user.
+
+---
+
+## TODO linkage
+
+Replaces the inline "MiniMax provider" bullet in `TODO.md` (In flight).
+Link this file from that entry.
@@ -0,0 +1,328 @@
+# models.dev as source of truth for model specs & pricing — 2026-06-04
+
+Adopts **models.dev** as the objective-facts source for model names,
+context windows, output limits, modalities, capabilities, and pricing —
+feeding `provider.Capabilities` and `Arm.CostPer1k{Input,Output}` — while
+gnoma's `internal/router/defaults.go` keeps the *subjective* routing
+policy. Prices are user-overridable via config.
+
+Adds the TODO.md entry "models.dev as source of truth for model specs".
+
+Reference: <https://github.com/anomalyco/models.dev> ·
+API: `https://models.dev/api.json` (also `models.json`, `catalog.json`).
+MIT-licensed, community-contributed TOML, served as static JSON.
+
+---
+
+## Problem
+
+gnoma scatters model facts across hardcoded tables:
+
+- **Capabilities** (context window, max output, vision, tool use) are
+  baked into each provider's `Models()` — e.g.
+  `internal/provider/openai/provider.go:120-241` has per-model
+  `ContextWindow`/`MaxOutput` literals.
+- **Pricing** is largely **absent**. `Arm.CostPer1k{Input,Output}` exist
+  (`internal/router/arm.go:63-64`, used by `arm.go:96`) and there is a
+  seam to populate them — `Router.RegisterProvider(..., costs map[string]
+  [2]float64)` at `internal/router/router.go:393,418` — but it has **no
+  production caller**. Arms are built via `RegisterArm` in
+  `cmd/gnoma/main.go:527,559,932` with per-token price left at zero. So
+  the cost-aware bandit math runs on mostly-empty data today.
+- **Routing policy** (`MaxComplexity`, `Strengths`, `CostWeight`,
+  `SizeCaps`) lives in `internal/router/defaults.go:53+` — benchmark-
+  derived judgments, manually refreshed (last snapshot 2026-05-23).
+
+These tables drift: new models ship, prices change, gnoma's literals go
+stale. models.dev solves exactly the *objective* half of this and is
+designed to be consumed as static JSON.
+
+### The seam (this is the whole spec)
+
+models.dev supplies **facts**; gnoma keeps **opinions**. Clean split:
+
+| Field | Source after this change |
+|---|---|
+| context window, max output, modalities, tool-use, reasoning/thinking, knowledge cutoff, status (deprecated/beta) | **models.dev** → `provider.Capabilities` |
+| input/output token price | **models.dev** → `Arm.CostPer1k{Input,Output}` (with user override) |
+| `MaxComplexity`, `Strengths`, `CostWeight`, `SizeCaps`, `Disabled` | **`defaults.go` stays** — models.dev has no opinion on these |
+
+`defaults.go` is **augmented, not replaced.** It loses nothing; it gains
+accurate facts to apply its policy against.
+
+---
+
+## Non-goals
+
+- **Replacing `internal/router/defaults.go`.** The subjective routing
+  policy stays hand-curated.
+- **A live dependency on models.dev at runtime.** gnoma stays offline-
+  first: a vendored snapshot ships in the binary; refresh is explicit and
+  opt-in (no phone-home).
+- **Letting models.dev override user config.** User `[provider]` /
+  `[arms]` / price overrides always win over the dataset.
+- **Importing models.dev's TOML format.** Consume the published
+  `api.json`; don't vendor their per-model TOML tree.
+
+---
+
+## Design
+
+### Data ingestion (`internal/modelsdb`)
+
+New package owning the dataset:
+
+```
+internal/modelsdb/
+  modelsdb.go    // typed view: Lookup(provider, model) -> ModelSpec
+  schema.go      // structs matching models.dev api.json
+  snapshot.go    // //go:embed vendored snapshot (offline default)
+  refresh.go     // fetch + validate + write user-cache copy
+  convert.go     // ModelSpec -> provider.Capabilities + per-1k cost
+```
+
+- **`schema.go`** maps the models.dev shape: per-provider, per-model
+  `name`, `cost.input`/`cost.output` (USD **per million tokens**),
+  `limit.context`/`limit.output`, `modalities.input`,
+  `tool_call`/`reasoning` flags, `knowledge`, `status`.
+- **`snapshot.go`** embeds a checked-in `api.json` snapshot via
+  `//go:embed` so a fresh binary works fully offline with sane defaults.
+- **`refresh.go`** implements `gnoma models refresh`: fetch `api.json`,
+  validate, write to `~/.config/gnoma/models.dev.json`. Load order at
+  startup: **user cache → embedded snapshot** (newest wins; user config
+  overrides both, see below).
+
+### Unit & currency conversion (`convert.go`) — easy to get wrong
+
+models.dev prices are **USD per million tokens**; gnoma's
+`Arm.CostPer1k{Input,Output}` is per-1k. Two transforms, kept distinct:
+
+1. **Unit: ÷ 1000** (per-million → per-1k). Always applied,
+   currency-independent. **This step gets an explicit unit test.**
+2. **Currency: convert USD → the user's display currency** (see below).
+
+`Arm.CostPer1k*` is stored in the **user's configured currency**; the
+unit comment in `arm.go:96` is updated from "EUR per 1k" to
+"per 1k, in `[models].currency`".
+
+Capabilities map directly and are currency-independent:
+`limit.context → ContextWindow`, `limit.output → MaxOutput`,
+`tool_call → ToolUse`, `modalities.input contains image → Vision`,
+`reasoning → ThinkingModes`.
+
+### Configurable display currency + daily FX rate (`fx.go`)
+
+The display currency is **user-configurable** (USD, EUR, GBP, …).
+models.dev is the USD source of truth; conversion is layered on top:
+
+- **`[models].currency`** sets the target (default `EUR` to match the
+  historical field; `USD` is the no-op identity).
+- **Daily FX rate, fetched on launch.** On startup gnoma checks a cached
+  rate (`~/.config/gnoma/fx-rate.json`); if it is older than today
+  (date-stamped, day-granular), it fetches a fresh USD→`currency` rate
+  from a configurable FX endpoint (`[models].fx_source`), updates the
+  cache, and applies it. The fetch is **non-blocking and best-effort**:
+  on failure (offline, endpoint down) gnoma keeps the last cached rate
+  and logs a one-line notice — it never blocks launch or errors out.
+- **Disable toggle.** `[models].currency_conversion = false` turns the
+  whole feature off: **no FX fetch, no network call, prices shown in
+  USD** (models.dev native). This is also the implied state when
+  `currency = "USD"`.
+- **Rate provenance.** The cached `fx-rate.json` records the rate, the
+  date fetched, and the source, so `gnoma models` / `gnoma doctor` can
+  show "prices in EUR @ 0.92 USD→EUR (2026-06-04, ecb)" and flag a stale
+  rate. A user may also pin a **fixed rate** (`[models].fx_rate = 0.92`)
+  to skip fetching entirely while still displaying a non-USD currency.
+
+FX rate precedence (highest first): **pinned `fx_rate` → today's cached
+fetch → last good cached fetch → `1.0` (USD identity) with a warning**.
+The FX endpoint host joins the egress allowlist baseline alongside
+`models.dev`.
+
+### Wiring into arm construction
+
+The existing seam is `RegisterProvider(..., costs)` (`router.go:393`).
+Two integration options (Open Questions):
+
+- **A (preferred):** at arm registration in `cmd/gnoma/main.go:527+`,
+  enrich each arm from `modelsdb.Lookup(provider, model)` — set
+  `CostPer1k*` from the converted price and **fill any zero-valued
+  Capabilities** the provider's `Models()` didn't supply. Provider
+  `Models()` literals become a fallback for models models.dev doesn't
+  list, not the primary source.
+- **B:** route everything through `RegisterProvider`'s `costs` map by
+  building it from `modelsdb`. Cleaner but requires switching `main.go`
+  off direct `RegisterArm`.
+
+Either way, **`defaults.go` applies on top unchanged** (longest-prefix
+family match for `MaxComplexity`/`Strengths`/`CostWeight`).
+
+### User-configurable cost (required)
+
+Prices are not one-size-fits-all: subscription plans make marginal cost
+~0 until quota (the MiniMax Coding Plan case in the provider TODO),
+negotiated enterprise rates differ, and local models are free. The
+models.dev price is the **default**, overridable per arm:
+
+```toml
+[models]
+refresh = "manual"             # manual | never  (never = embedded snapshot only)
+currency = "EUR"               # display currency; USD = identity (no conversion)
+currency_conversion = true     # false → no FX fetch, prices shown in USD
+fx_source = "https://..."      # daily USD→currency rate endpoint (egress-allowlisted)
+# fx_rate = 0.92               # optional: pin a fixed rate, skip daily fetch
+
+# Per-arm / per-model price override — wins over models.dev.
+# Override prices are interpreted in [models].currency.
+[[provider.cost]]
+arm = "minimax/MiniMax-M2"
+billing = "subscription"       # zeroes marginal cost while quota remains
+# or explicit metered numbers (per 1k, in [models].currency):
+[[provider.cost]]
+arm = "anthropic/claude-..."
+input_per_1k  = 0.0028
+output_per_1k = 0.014
+```
+
+Precedence (highest first): **user `[[provider.cost]]` override →
+models.dev (unit-converted + currency-converted) → provider `Models()`
+fallback → zero**. Both input *and* output prices flow through the same
+unit ÷1000 and currency conversion. The
+`billing = "subscription"` knob ties into the open MiniMax billing
+question (TODO "MiniMax provider") and zeroes `CostWeight`-effective cost
+while quota remains, then hard-stops on 429 failover. Local arms
+(`IsLocal`) default to zero cost regardless of dataset.
+
+### Offline-first & egress
+
+- The embedded snapshot means **zero network calls** unless the user runs
+  `gnoma models refresh`.
+- `models.dev` becomes a curated host in the egress allowlist baseline
+  (`2026-06-04-egress-allowlist.md` ships package + provider hosts; add
+  `models.dev`), so even refresh stays inside the firewall policy.
+- `gnoma doctor` (shipped `cmd/gnoma/doctor_cmd.go`) gains a check:
+  snapshot age, models referenced in config but absent from the dataset,
+  and prices that look stale vs the dataset.
+
+### Surfacing
+
+- `gnoma models` lists resolved arms with their effective price + caps +
+  source (`models.dev` / `override` / `fallback`) — analogous to
+  `gnoma providers`.
+- The TUI status line / model picker can show context window and
+  price-per-turn estimates now that the data is reliable
+  (`internal/tui/rendering.go:551-620`, ties to the TUI/UX plan).
+
+---
+
+## Touch-points (file:line)
+
+| Change | Location |
+|---|---|
+| New dataset package | new `internal/modelsdb/` |
+| Embedded snapshot | `internal/modelsdb/snapshot.go` (`//go:embed api.json`) |
+| Daily FX fetch + cache | new `internal/modelsdb/fx.go`, `~/.config/gnoma/fx-rate.json`, called on launch near config load `cmd/gnoma/main.go:131-166` |
+| `gnoma models` / `models refresh` subcommand | `cmd/gnoma/main.go:179-196`; new `cmd/gnoma/models_cmd.go` |
+| Capabilities struct (target) | `internal/provider/provider.go:94` |
+| Per-model cap literals (become fallback) | `internal/provider/openai/provider.go:120-241` (+ peers) |
+| Cost fields + math | `internal/router/arm.go:63-64,96` |
+| Cost seam | `internal/router/router.go:393,418` |
+| Arm enrichment at registration | `cmd/gnoma/main.go:527,559,932` |
+| Routing policy (unchanged, applied on top) | `internal/router/defaults.go:53+` |
+| Config: `[models]`, `[[provider.cost]]` | `internal/config/config.go` |
+| doctor checks (snapshot + FX-rate staleness) | `cmd/gnoma/doctor_cmd.go`, `internal/config/doctor.go` |
+| Egress hosts (`models.dev` + `fx_source`) | `2026-06-04-egress-allowlist.md` baseline |
+
+---
+
+## Testing (TDD — write first)
+
+- **Schema parse:** `api.json` (a fixture slice) unmarshals into
+  `schema.go` structs; unknown fields ignored; missing optional fields
+  tolerated.
+- **Unit conversion (critical):** a known models.dev entry (USD/million)
+  converts to the expected USD/1k — guards the ÷1000 step independently
+  of currency.
+- **Currency conversion:** USD/1k → EUR/1k given a rate; `currency="USD"`
+  and `currency_conversion=false` are both identity (no conversion,
+  prices in USD); a pinned `fx_rate` is used verbatim. Output and input
+  prices both convert.
+- **Daily FX fetch:** a cache dated today is reused (no fetch); a stale
+  cache triggers a fetch against a stub endpoint and updates the cache;
+  a failed fetch falls back to the last good cached rate (and to `1.0`
+  with a warning if none) — launch never blocks or errors.
+- **Capability mapping:** `tool_call`→`ToolUse`, image modality→`Vision`,
+  `limit.context`→`ContextWindow`, `reasoning`→`ThinkingModes`.
+- **Override precedence:** user `[[provider.cost]]` beats models.dev;
+  models.dev beats provider fallback; `billing="subscription"` zeroes
+  marginal cost; `IsLocal` arms are free regardless of dataset.
+- **defaults.go untouched:** an arm enriched from models.dev still gets
+  its `MaxComplexity`/`Strengths`/`CostWeight` from the family table
+  (longest-prefix match), and a model *absent* from models.dev still
+  works via provider `Models()` fallback.
+- **Offline:** with no user cache and network blocked, the embedded
+  snapshot fully populates arms (no network call attempted).
+- **Refresh:** `models refresh` against a stub server writes a valid
+  user cache; a malformed response is rejected and the prior cache /
+  snapshot is retained (no corruption).
+- **doctor:** flags a config-referenced model missing from the dataset
+  and a stale snapshot.
+
+### Acceptance criteria
+
+1. A fresh binary populates context window, max output, vision, tool-use,
+   and price for known models **offline** from the embedded snapshot.
+2. `gnoma models` shows each arm's effective caps + price + source.
+3. `gnoma models refresh` updates the dataset within the egress policy;
+   offline default unchanged without it.
+4. User `[[provider.cost]]` overrides (explicit price or
+   `billing="subscription"`) win over models.dev; local arms are free.
+5. `internal/router/defaults.go` policy still applies on top, unchanged.
+6. A model not in models.dev still works via the provider's `Models()`
+   fallback.
+7. Unit (÷1000) and currency conversion are correct and unit-tested.
+8. Display currency is user-configurable; the FX rate is fetched daily on
+   launch (best-effort, non-blocking), cached, and shown with provenance.
+9. `currency_conversion = false` (or `currency = "USD"`) disables the FX
+   fetch entirely and shows prices in USD.
+
+---
+
+## Open questions (resolve at implementation)
+
+- **FX rate source** — which `fx_source` endpoint ships as the default
+  (ECB daily reference rates are free, EUR-based, no key; others need an
+  API key). Pick a keyless default; document overriding it. The daily
+  cadence is day-granular (date-stamped cache), not intraday.
+- **Currency field unit** — `Arm.CostPer1k*` now stores the user's
+  display currency (was nominally EUR). Confirm no other code assumes the
+  field is EUR; update the `arm.go:96` comment. Cost-comparison math in
+  the bandit is currency-agnostic (all arms share one currency) so
+  selection is unaffected.
+- **Integration point** — enrich arms in-place at `main.go` (Option A,
+  preferred, smaller diff) vs route through `RegisterProvider`'s `costs`
+  map (Option B, cleaner seam). Decide when touching `main.go`.
+- **Endpoint choice** — `api.json` (full) vs `models.json` (provider-
+  agnostic) vs `catalog.json`. Lean `api.json`; the snapshot makes size
+  a non-issue.
+- **Refresh cadence** — manual-only (chosen, no-phone-how posture) vs an
+  opt-in periodic check. Default manual; never auto.
+- **Snapshot freshness in CI** — whether a CI job re-vendors the embedded
+  `api.json` on a schedule so shipped binaries don't drift. Likely yes;
+  separate chore.
+- **MaxComplexity from benchmarks** — models.dev has no complexity
+  opinion; if it ever adds benchmark data, revisit whether `defaults.go`
+  could derive `MaxComplexity`. Out of scope now.
+
+---
+
+## TODO linkage
+
+New "models.dev as source of truth for model specs" entry in `TODO.md`
+(In flight) links here. Augments (does not replace) `defaults.go`:
+models.dev supplies objective facts → `provider.Capabilities` +
+`Arm.CostPer1k*`; prices are user-overridable via `[[provider.cost]]`
+(intersects the MiniMax subscription-billing question); display currency
+is configurable with a daily best-effort FX rate fetched on launch
+(disable → USD); offline-first via an embedded snapshot; `models.dev` and
+the FX source join the egress allowlist baseline.
@@ -0,0 +1,312 @@
+# Multi-Agent Engineering Forge (MAEF) — 2026-06-04
+
+A deterministic, language-agnostic pipeline orchestrator that decouples
+**Context Mapping → Code Generation → Deterministic Validation →
+Cross-Vendor Critique** into a stateful state machine with strict
+programmatic gates and loop-back. Shipped as `gnoma forge`.
+
+Adds the TODO.md entry "Multi-Agent Engineering Forge (MAEF)".
+
+---
+
+## Problem
+
+gnoma's single-turn agentic loop (`internal/engine/loop.go:88` `runLoop`)
+is excellent for interactive work but couples four concerns the user's
+MAEF spec wants separated: planning, generation, deterministic
+validation, and semantic critique. The MAEF design's core claim is that
+**transitions between stages are governed by programmatic gates, not LLM
+choices** — a state machine, not a mega-prompt. That maps almost exactly
+onto machinery gnoma already owns; the only genuinely new package is the
+sandbox.
+
+The mapping (this is the whole spec — reuse, don't duplicate):
+
+| MAEF concept | gnoma reality |
+|---|---|
+| Deterministic orchestrator with programmatic gates | A **Go state machine** in new `internal/forge` — not an LLM, not the engine's tool-driven loop |
+| Agent 1 Context Planner (LLM) | An **elf** (`elf.Manager.SpawnWithProvider`, `internal/elf/manager.go:153`), read-only tools, JSON output |
+| Agent 2 Forge Agent (LLM) | An **elf** that emits a unified diff (`diff -u`) as text |
+| Agent 3 Sandbox Gate (**non-LLM**) | A plain Go function over a new `internal/sandbox` — **not** an elf |
+| Agent 4 Adversarial Critic (LLM) | An **elf pinned to a different vendor/arm** than Forge (`router.ForceArm`) |
+| Unified Model Intermediary | gnoma's existing `provider.Provider` + `router` |
+| Ephemeral Docker workspace | git-**worktree** default; docker an optional backend behind one interface |
+
+The LLM stages are elfs (each its own `engine.Engine`, system prompt,
+and routed arm). The gates between them are deterministic Go. Making
+that split explicit is what keeps this from becoming a parallel system
+bolted next to the engine.
+
+---
+
+## Non-goals
+
+- **Replacing the interactive TUI / pipe modes.** `gnoma forge` is a new
+  batch/headless entry mode alongside them.
+- **Replacing the engine's `runLoop`.** Each elf still runs the normal
+  loop internally; MAEF orchestrates *between* elfs.
+- **A general workflow engine.** The pipeline is fixed (Plan → Forge →
+  Sandbox → Critic with loop-back); arbitrary DAGs are out of scope.
+- **Docker as a hard dependency.** Worktree is the default backend so the
+  static-binary, no-daemon posture holds; docker is opt-in.
+- **LLM-driven control flow.** Stage transitions are Go code with status
+  codes, never a model deciding "what next".
+
+---
+
+## Design
+
+### Entry mode: `gnoma forge`
+
+New subcommand following the established dispatch pattern
+(`cmd/gnoma/main.go:179-196`, peers `doctor`/`config`/`router`): add
+`case "forge": os.Exit(runForgeCommand(...))` and a `forge_cmd.go`.
+Inputs: a spec (file or stdin) + the user prompt. Reuses the same
+config/router/security/elf-manager construction as TUI/pipe; only the
+front-end orchestration differs.
+
+```
+gnoma forge --spec ./spec.md "add rate-limit middleware to the auth router"
+gnoma forge --spec ./spec.md --max-iters 5 --critic-arm anthropic/...
+```
+
+### Package layout
+
+```
+internal/forge/
+  forge.go       // state machine: states, transitions, the run loop
+  planner.go     // Stage 1 elf: context map (read-only tools, JSON out)
+  forger.go      // Stage 2 elf: emit unified diff
+  critic.go      // Stage 4 elf: semantic critique, cross-vendor arm
+  state.go       // Iteration state, feedback history, terminal-failure handling
+  prompts.go     // System prompts per stage (constraints from MAEF §2)
+internal/sandbox/
+  sandbox.go     // Sandbox interface (the only genuinely new abstraction)
+  worktree.go    // default backend: git worktree + host exec
+  docker.go      // optional backend (build tag / config-gated)
+  config.go      // WorkspaceConfiguration contract (setup/validate/test)
+```
+
+The Stage-3 gate is a function in `forge.go` that calls `internal/sandbox`
+— deliberately **not** a file in the elf/agent layer, to keep "non-LLM"
+honest.
+
+### The state machine (`forge.go`)
+
+States and the **programmatic** transitions between them:
+
+```
+PLAN ─► FORGE ─► SANDBOX ─┬─[exit≠0]─► FORGE   (sandbox_error, bypass critic)
+                          └─[exit=0]─► CRITIC ─┬─[reject]─► FORGE (critic_critique)
+                                               └─[APPROVED]─► DONE
+guards: iter < max_iters; patch applies cleanly; worktree state consistent
+terminal failures ─► ABORT (revert worktree to last good commit)
+```
+
+- **Gate after Sandbox:** if the sandbox exit code is non-zero, capture
+  stdout/stderr verbatim and route it back to Forge as a priority
+  `sandbox_error` — **the Critic is bypassed entirely** (MAEF §2.3). On
+  exit 0, package the applied diff + logs and advance to Critic.
+- **Gate after Critic:** `STATUS: APPROVED` (exact sentinel) → DONE; any
+  other output is parsed as a `critic_critique` and looped back to Forge.
+- **Loop budget:** hard `--max-iters` ceiling (default 5) so the pipeline
+  always terminates. Each iteration carries the feedback history forward
+  (`state.go`), and the Forge prompt is instructed to prioritise the most
+  recent `sandbox_error` / `critic_critique` over new additions
+  (MAEF §2.2).
+
+### Stage 1 — Context Planner (elf)
+
+`manager.Spawn(ctx, taskType, prompt, plannerSystemPrompt, maxTurns)`
+(`internal/elf/manager.go:65`) with **read-only tools only** (`fs.read`,
+grep/glob — gate via the engine's allowed-tools / `TurnOptions`,
+`internal/engine/loop.go` `TurnOptions`). System prompt (`prompts.go`)
+enforces the MAEF §2.1 constraints: do not write code; emit JSON with
+`targets` / `dependencies` / `rationale`. Output parsed against a schema;
+a malformed map is a retry, then a terminal failure.
+
+### Stage 2 — Forge Agent (elf)
+
+Ingests the context map + source of mapped files + spec + accumulated
+feedback. System prompt enforces MAEF §2.2: **emit only a unified diff**
+(`diff -u`), no prose, never a full file when a partial edit suffices.
+The diff is **applied via `git apply` inside the sandbox worktree** —
+*not* the `fs.edit` string-replace tool (`internal/tool/fs/edit.go`).
+This matches the user's `diff -u` contract and is atomic/cleanly
+reversible. A corrupt patch is rejected immediately and the raw
+`git apply` error is fed straight back to Forge (MAEF §2.3 rule 1).
+
+### Stage 3 — Deterministic Sandbox Gate (non-LLM)
+
+A Go function, not an elf. Backed by `internal/sandbox`:
+
+```go
+type Sandbox interface {
+    Apply(patch []byte) error           // git apply in the workspace
+    Run(step string) (Result, error)    // setup / validate / test command
+    Revert() error                      // back to last good commit
+    WorkDir() string
+    Cleanup() error
+}
+```
+
+- **Default backend `worktree.go`:** create a detached git worktree off
+  the current commit (`git worktree add`), apply the patch there, run the
+  lifecycle commands on the host. Fits the static-binary, no-daemon
+  posture — and is the same isolation primitive the agent harness itself
+  uses. On terminal failure, `git worktree remove` / reset (the user's
+  infinite-loop guard: state-sync errors are terminal, revert to last
+  good commit).
+- **Optional backend `docker.go`:** the same interface over an ephemeral
+  container, gated by config/build-tag, honouring the user's
+  `WorkspaceConfiguration` YAML (`base_image`, `setup`, `validate`,
+  `test`). Swapping backends never touches `forge.go`.
+- **Lifecycle contract (`config.go`)** mirrors the MAEF YAML:
+  `setup` (e.g. `go mod download` / `npm ci`), `validate`
+  (`go vet` / `cargo check` / `npm run lint`), `test`
+  (`go test ./...` / `jest --findRelatedTests`). Language-agnostic —
+  commands come from `[forge.sandbox]` config or are auto-detected from
+  the project (reuse the `SessionStart` project-type detection already in
+  the repo).
+
+### Stage 4 — Adversarial Critic (elf, **cross-vendor**)
+
+The headline of the user's spec. The Critic must be a **different
+vendor/arm than the Forge** so the critique is genuinely independent, not
+the same model grading itself.
+
+- Spawn via `manager.SpawnWithProvider(prov, model, …)`
+  (`internal/elf/manager.go:153`) with the arm chosen by
+  `router.ForceArm` (`internal/router/router.go:147`) so forge-arm ≠
+  critic-arm is **enforced**, not hoped for. If only one vendor is
+  configured, log a clear degraded-mode warning (critique still runs,
+  independence not guaranteed).
+- Inputs: original spec, applied patch, sandbox logs. System prompt
+  enforces MAEF §2.4: **forbidden from writing code/patches**; evaluates
+  performance, security surface, spec alignment; emits structured
+  markdown pointers or the exact sentinel `STATUS: APPROVED`.
+
+### Security & safety interplay
+
+The sandbox runs **AI-generated patches and tests** — a real execution
+surface. All existing boundaries still apply:
+
+- `safety.ClassifyCWD` runs before the forge starts; a `refuse`
+  classification aborts.
+- Every elf's provider is `security.WrapProvider`-wrapped
+  (`internal/security/safeprovider.go:33`) exactly like interactive arms,
+  so firewall + audit + egress allowlist
+  (`2026-06-04-egress-allowlist.md`) hold across all stages.
+- Sandbox command execution goes through the same `permission` /
+  validation discipline as the `bash` tool
+  (`internal/tool/bash/bash.go` `ValidateCommand`); in headless forge
+  mode the permission posture is config-driven (default: deny network in
+  sandbox unless the lifecycle commands need a declared host).
+- Terminal state-sync failures **revert the worktree** and abort rather
+  than looping — directly addresses the MAEF §3 infinite-error-loop risk.
+
+### Unified Model Intermediary
+
+The MAEF "unified completion interface" already exists as
+`provider.Provider` (`internal/provider/provider.go:136`) behind the
+router. MiniMax / Anthropic / local Ollama (the user's diagram's three
+backends) are just arms. No new abstraction — `prompts.go` + the elf's
+`request` is the `request_completion(system, prompt, schema)` surface.
+
+---
+
+## Touch-points (file:line)
+
+| Change | Location |
+|---|---|
+| `forge` subcommand dispatch | `cmd/gnoma/main.go:179-196`; new `cmd/gnoma/forge_cmd.go` |
+| State machine + gates | new `internal/forge/forge.go`, `state.go` |
+| Planner / Forger / Critic elfs | new `internal/forge/{planner,forger,critic,prompts}.go` |
+| Elf spawn (generic + arm-pinned) | `internal/elf/manager.go:65,153` |
+| Cross-vendor enforcement | `internal/router/router.go:147` (`ForceArm`) |
+| Read-only tool gating for Planner | `internal/engine/loop.go` `TurnOptions` (AllowedTools) |
+| Sandbox abstraction | new `internal/sandbox/{sandbox,worktree,docker,config}.go` |
+| Patch apply (git, not fs.edit) | `internal/sandbox/worktree.go` (`git apply`) |
+| Command validation reuse | `internal/tool/bash/bash.go` `ValidateCommand` |
+| CWD classification | `internal/safety` `ClassifyCWD` |
+| Provider wrapping | `internal/security/safeprovider.go:33` |
+| Config section | `internal/config/config.go` (new `[forge]` + `[forge.sandbox]`) |
+
+---
+
+## Testing (TDD — write first)
+
+- **State machine (no LLM, no real sandbox):** drive `forge.go` with a
+  stub planner/forger/critic and a fake sandbox returning scripted exit
+  codes. Assert:
+  - sandbox exit≠0 routes back to Forge and **bypasses** Critic;
+  - sandbox exit=0 advances to Critic;
+  - Critic `STATUS: APPROVED` → DONE; any other output → loop to Forge;
+  - `--max-iters` is a hard ceiling (terminates, returns last state);
+  - a corrupt patch / worktree desync is **terminal** → revert + abort,
+    never an infinite loop.
+- **Sandbox (worktree backend):** in a `t.TempDir()` git repo, apply a
+  valid patch (succeeds), a corrupt patch (clean rejection with raw
+  error surfaced), run a failing `validate` (non-zero captured), and a
+  passing one; `Revert` restores the last good commit.
+- **Cross-vendor guard:** with two arms configured, assert forge-arm ≠
+  critic-arm; with one arm, assert the degraded-mode warning fires and
+  the pipeline still runs.
+- **Planner schema:** valid JSON parses into `targets`/`dependencies`;
+  malformed output retries then fails terminally; planner cannot invoke
+  a write tool (allowed-tools gate).
+- **Forger output discipline:** non-diff output (prose) is rejected
+  before reaching the sandbox.
+- **Integration (`//go:build integration`):** end-to-end `gnoma forge`
+  on a fixture repo with a trivial spec, real arms, real worktree —
+  produces an applied, test-passing, critic-approved patch.
+
+### Acceptance criteria
+
+1. `gnoma forge --spec … "<prompt>"` runs Plan → Forge → Sandbox →
+   Critic to either an approved patch or a clean bounded failure.
+2. A failing sandbox loops back to Forge with raw logs and **never**
+   reaches the Critic that iteration.
+3. The Critic runs on a different vendor/arm than the Forge (or warns).
+4. Patches apply via `git apply` in an isolated worktree; the user's
+   working tree is untouched until the final approved patch is offered.
+5. A corrupt patch or worktree desync aborts with a revert — no infinite
+   loop.
+6. Docker backend is selectable via config without changing `forge.go`.
+7. All firewall / audit / egress / CWD-classification boundaries apply to
+   every stage.
+
+---
+
+## Open questions (resolve at implementation)
+
+- **Sandbox backend default** — git-worktree (chosen: no daemon, fits
+  static binary) vs docker-ephemeral (the user's diagram's default).
+  Worktree default; docker the swappable backend.
+- **Final patch delivery** — auto-apply the approved patch to the user's
+  tree, or leave it staged in the worktree / emit it as a `.patch` for
+  the user to apply. Lean: emit + offer to apply (never silently mutate
+  the working tree).
+- **Critic arm selection** — explicit `--critic-arm` vs automatic "pick
+  the highest-quality arm from a different vendor than Forge". Support
+  both; auto by default.
+- **Lifecycle command source** — `[forge.sandbox]` config vs
+  auto-detection from project type. Auto-detect with config override.
+- **Planner/Forger/Critic as router task-types** — whether to add
+  `TaskPlan` / `TaskCritique` `TaskType`s so the bandit can learn
+  per-stage arm quality, or pin arms explicitly. Start pinned; add
+  task-types if telemetry justifies (ties to the bandit-design TODO).
+- **Relationship to the `agent` tool / elf orchestration** — MAEF is a
+  fixed pipeline; the existing `internal/tool/agent` fan-out stays for
+  interactive sub-agent spawning. Keep them separate.
+
+---
+
+## TODO linkage
+
+New "Multi-Agent Engineering Forge (MAEF)" entry in `TODO.md` (In
+flight) links here. Builds on the engine, elf manager, router
+(`ForceArm` for cross-vendor critique), and security boundaries; the
+only new abstraction is `internal/sandbox` (worktree default, docker
+optional). The deterministic orchestrator lives in `internal/forge` as a
+Go state machine — the LLM stages are elfs, the validation gate is not.
@@ -0,0 +1,230 @@
+# TUI/UX refresh — opencode-inspired patterns — 2026-06-04
+
+Closes concrete UX gaps in gnoma's existing Bubble Tea TUI by borrowing
+proven interaction patterns from **opencode** (peer AI-coding TUI) and the
+layout/component philosophy of **opentui**.
+
+Adds the TODO.md entry "TUI/UX refresh — opencode-inspired patterns".
+
+References:
+
+- opencode — <https://github.com/anomalyco/opencode> (UX patterns to mine).
+- opentui — <https://github.com/anomalyco/opentui> (component/layout
+  *concepts* only — see "What we do **not** borrow" below).
+
+---
+
+## Problem
+
+gnoma already ships a capable Bubble Tea v2 TUI
+(`internal/tui/`, launched from `cmd/gnoma/main.go:109-115,1151-1172`):
+themes (`theme.go:30-106`), pickers, slash commands
+(`completions.go:17-46`), vim mode (`app.go:378-422`), an elf-progress
+tree (`rendering.go:373-456`), a three-segment status line
+(`rendering.go:551-620`), and permission-mode cycling
+(`app.go:643-668`). This is **not greenfield** — it is gap-closing.
+
+opencode is the closest peer (a terminal-first agentic coder) and has
+converged on a handful of UX patterns gnoma lacks or under-serves. This
+plan ports those patterns onto the existing `internal/tui/*` surface,
+mapping each to the file:line it touches. Nothing here rewrites the TUI;
+each item is an additive refinement.
+
+### What we do **not** borrow
+
+opentui is a **Zig core with TypeScript bindings** (C-ABI, SolidJS/React
+reconcilers, WebGPU targets). None of it is consumable from gnoma's
+Go + Bubble Tea stack. We take exactly two *concepts* from it and write
+them in Go:
+
+1. **Layout primitives over manual string-joining.** opentui leans on a
+   flexbox layout engine; gnoma's `rendering.go` hand-assembles regions
+   with `lipgloss.JoinVertical/Horizontal`. We formalise a small
+   region/pane layout helper rather than adopting any opentui code.
+2. **Core-vs-bindings split.** Keep render-state (the "what") separate
+   from lipgloss styling (the "how"), so themes and future render
+   targets don't fork the view logic.
+
+We do **not** add a reconciler, a second render target, WebGPU, or any
+non-Go dependency. opentui stays inspiration, not import.
+
+---
+
+## Non-goals
+
+- **A rewrite of the Bubble Tea model.** `app.go`'s `Model`/`Update`/
+  `View` stay; every item is additive.
+- **A second render backend** (web/WebGPU). The `gnoma web` milestone
+  (M15) is tracked separately; this plan is terminal-only.
+- **A client/server split.** opencode runs a TS server behind its TUI;
+  gnoma is a single static binary and stays that way. The session-share
+  item below is export/import, not a hosted service.
+- **Replacing glamour markdown rendering.** We refine how diffs and tool
+  output render, not the markdown engine.
+
+---
+
+## Design — patterns, each mapped to the existing TUI
+
+### 1. Agent / mode switch on a single key (opencode `Tab`)
+
+opencode toggles **plan** (read-only, asks before bash) vs **build**
+(full access) with `Tab`. gnoma already *has* the underlying machine —
+`permission.Mode` (bypass / deny / plan / accept_edits / auto) cycled
+via Shift+Tab (`app.go:643-668`). The gap is discoverability and a
+first-class "plan vs do" framing.
+
+- Promote **plan** and **accept_edits/auto** to a labelled two-state
+  toggle surfaced in the status line (`rendering.go:551-620`), with the
+  full five-mode cycle still on Shift+Tab. Reuse `ModeColor`
+  (`theme.go:164-171`) for the indicator.
+- No new permission semantics — pure presentation over the existing
+  `permission.Checker`.
+
+### 2. Leader-key command palette
+
+Today slash commands are typed (`/model`, `/theme`, …) with completion
+(`completions.go:17-46`, `app.go:1188-1500+`). opencode adds a
+leader-key palette for the same actions without typing `/`.
+
+- Add a leader key (default `Ctrl+K`, configurable) that opens the
+  existing picker overlay machinery (`app.go:339-366`,
+  `rendering.go:126-148`) pre-populated with the `builtinCommands`
+  source. This is a new *entry point* to existing pickers, not a new
+  widget.
+
+### 3. External theme files (opencode-style theming)
+
+gnoma has five built-in themes hardcoded in `theme.go:30-106`. opencode
+loads user theme files. Extend, don't replace:
+
+- Keep the five built-ins. Add loading of `*.toml`/`*.json` theme files
+  from `~/.config/gnoma/themes/` and `.gnoma/themes/`, parsed into the
+  existing `Theme` struct (`theme.go:13-27`) and registered into the
+  `Themes` array. `/theme <name>` and the picker pick them up for free.
+- The `[tui] theme` config key (`config.go:434-437`) already selects by
+  name; user themes just widen the namespace.
+
+### 4. Diff & file-tree rendering for edits
+
+Tool results currently render generically (`rendering.go:254-371`). The
+biggest visible opencode win is **syntax-aware diff rendering** for
+file edits.
+
+- Detect `fs.edit`/`fs.write` tool results (the edit tool already emits a
+  diff-style payload, `internal/tool/fs/edit.go:136-191`) and render
+  them as a proper red/green unified diff using theme colors, instead of
+  raw text.
+- Optional: a compact changed-files summary line per turn (paths +
+  +/- counts), themed via the status palette.
+
+### 5. Session resume / share (export-import, no server)
+
+opencode has session sharing via its server. gnoma's no-phone-home
+posture rules out hosting, but the *resume* and *portable export* parts
+fit:
+
+- `internal/session` already persists sessions (`SessionStore`). Add a
+  TUI session picker (`/sessions`) over the store + the project registry
+  (`~/.config/gnoma/projects.json`, shipped in `56d7217`) for
+  cross-project recency.
+- "Share" becomes **export to a self-contained transcript file**
+  (markdown or JSON) the user can attach anywhere — explicitly local,
+  documented in the Security section.
+
+### 6. LSP-backed context (opencode parity, optional)
+
+opencode feeds LSP diagnostics into context. This is the largest item
+and is **gated** — list it so the spec is complete, but scope it as a
+follow-up dependent on whether an LSP client lands in `internal/tool`.
+For now: acknowledge the gap, don't build it under this plan.
+
+### 7. Layout helper (the one opentui concept)
+
+`rendering.go` joins regions imperatively. Introduce a tiny
+`internal/tui/layout` helper expressing the chat / status / input /
+overlay regions declaratively (sizes, weights, ordering) so resize
+handling and overlay placement stop being ad-hoc. View logic computes a
+layout tree of *regions*; lipgloss styling stays in `theme.go`. This is
+the "core vs bindings" split, in Go, with zero new deps.
+
+---
+
+## Touch-points (file:line)
+
+| Change | Location |
+|---|---|
+| Plan/build mode toggle + status indicator | `internal/tui/app.go:643-668`, `internal/tui/rendering.go:551-620`, `theme.go:164-171` |
+| Leader-key palette entry point | `internal/tui/app.go:339-366,585-598`, `completions.go:17-46`, picker render `rendering.go:126-148` |
+| External theme file loading | `internal/tui/theme.go:13-27,30-106,182-246`, config key `internal/config/config.go:434-437` |
+| Diff rendering for edits | `internal/tui/rendering.go:254-371`, edit-diff source `internal/tool/fs/edit.go:136-191` |
+| Session picker + transcript export | `internal/tui/app.go:1188-1500+` (new `/sessions`, `/export`), `internal/session` `SessionStore`, project registry |
+| Layout helper | new `internal/tui/layout/`, consumed by `rendering.go:21-64` |
+| New keybindings registry | `internal/tui/app.go:336-810` (centralise the literals), `[tui]` config |
+
+---
+
+## Testing (TDD — write first)
+
+- **Theme loading:** a malformed user theme file is rejected with a
+  clear error and falls back to the configured built-in (no panic).
+  A valid user theme appears in the picker and `ApplyTheme` produces the
+  expected styles.
+- **Diff rendering:** an `fs.edit` result renders as red/green hunks;
+  a non-diff tool result is unaffected (golden-string test on the
+  rendered output).
+- **Palette:** leader key opens the palette pre-filled with the same
+  commands `completionSource` yields; selecting an item dispatches the
+  identical `handleCommand` path as typing the slash command.
+- **Mode toggle:** the labelled toggle and Shift+Tab cycle stay in sync
+  with `permission.Checker`'s mode; the status indicator color matches
+  `ModeColor`.
+- **Session picker / export:** picker lists sessions from the store +
+  registry ordered by recency; export produces a transcript that
+  round-trips (re-import yields the same message list).
+- **Layout helper:** unit tests on region sizing across terminal widths
+  (narrow / wide / resize) with no overlap and correct overlay placement.
+- **Render snapshots:** golden tests for `View()` at representative
+  states (streaming, picker open, permission prompt) so refactors are
+  caught.
+
+### Acceptance criteria
+
+1. `Ctrl+K` opens a command palette routing to the same actions as
+   slash commands.
+2. A user theme file in `~/.config/gnoma/themes/` is selectable and
+   applies; built-ins unchanged.
+3. File edits render as a colored unified diff in the chat.
+4. A plan/build mode indicator is visible in the status line; both the
+   toggle and Shift+Tab drive `permission.Checker`.
+5. `/sessions` lists and resumes prior sessions across projects;
+   `/export` writes a self-contained transcript.
+6. No new non-Go dependency; binary stays single-static.
+
+---
+
+## Open questions (resolve at implementation)
+
+- **Leader key default** — `Ctrl+K` vs leaving it config-only to avoid
+  clashing with existing bindings (`app.go:336-810`). Default `Ctrl+K`,
+  configurable.
+- **Theme file format** — TOML (matches gnoma config) vs JSON (matches
+  opencode themes, eases porting their palettes). Lean TOML; accept both.
+- **opencode-vs-opentui scope** — we deliberately take UX *patterns*
+  from opencode and only two layout *concepts* from opentui. If a future
+  `gnoma web` target lands, revisit whether the layout helper should
+  generalise toward an opentui-style region tree.
+- **Diff renderer** — write a minimal in-house unified-diff colorizer vs
+  pull a small Go diff-rendering lib. Prefer in-house (no dep, the edit
+  tool already emits structured diffs).
+- **LSP context (item 6)** — out of scope here; gate on an
+  `internal/tool` LSP client landing.
+
+---
+
+## TODO linkage
+
+New "TUI/UX refresh — opencode-inspired patterns" entry in `TODO.md`
+(In flight) links here. Gap-closing against the existing
+`internal/tui/*`; opencode supplies the UX patterns, opentui supplies
+two layout concepts (re-implemented in Go, not imported).
@@ -0,0 +1,113 @@
+# Implementation roadmap — 2026-06-04
+
+Root sequencing spec for the in-flight work. Each tier is a self-contained
+merge unit; tiers may overlap when plans are written by separate elfs but
+the listed order is the *target* sequence.
+
+Ties together the open items from [TODO.md §In flight](../../TODO.md)
+and the 2026-06-04 plans under `docs/superpowers/plans/`.
+
+---
+
+## Tier 1 — Small ships, low coupling (~1-2 weeks)
+
+| # | Plan | Depends on | Surface |
+|---|---|---|---|
+| 1 | [2026-06-04-config-migration-followups.md](../plans/2026-06-04-config-migration-followups.md) | — | encoder fix (Duration pointer) |
+| 2 | [2026-06-04-minimax-provider.md](../plans/2026-06-04-minimax-provider.md) | — | `openaicompat` + metered billing slice |
+| 3 | [2026-06-04-models-dev-source-of-truth.md](../plans/2026-06-04-models-dev-source-of-truth.md) | — | embedded snapshot + read-side wiring |
+
+All three are provider/router-adjacent and parallelize cleanly. None
+touch the engine loop. Each is a self-contained PR.
+
+**Note on Tier 1 ordering vs. egress:** models.dev ships with the
+embedded-snapshot default (per its plan). The `models refresh` wire-fetch
+path is gated behind the Tier 3 egress work — that is **not** a hard
+dependency for the Tier 1 ship.
+
+## Tier 2 — UX + integration polish (~2-3 weeks, parallelizable)
+
+| # | Plan | Depends on | Surface |
+|---|---|---|---|
+| 4 | [2026-06-04-tui-ux-opencode.md](../plans/2026-06-04-tui-ux-opencode.md) | — | additive on `internal/tui/*` |
+| 5 | [2026-06-04-distribution-followups.md](../plans/2026-06-04-distribution-followups.md) | — | cosign, brew, dockers_v2 |
+
+Pure polish. No engine change. Can run in parallel with Tier 1 and Tier 3.
+
+## Tier 3 — Egress foundation (~2-3 weeks)
+
+| # | Plan | Depends on | Surface |
+|---|---|---|---|
+| 6 | [2026-06-04-egress-allowlist.md](../plans/2026-06-04-egress-allowlist.md) | audit log (already shipped) | transport-layer Learn → Review → Enforce |
+
+Blocks the wire-fetch path of models.dev refresh, future SDK egress
+controls, and any future "gnoma fetches at runtime" feature.
+
+## Tier 4 — Cross-platform Phase 1 (~1 week)
+
+| # | Plan | Depends on | Surface |
+|---|---|---|---|
+| 7 | [2026-06-04-cross-platform.md](../plans/2026-06-04-cross-platform.md) (Phase 1 only) | — | release-archive smoke matrix per platform |
+
+Per the plan: Phase 1 is the precondition for an honest r/devops post.
+Phase 2 items land one-per-PR as r/devops questions surface.
+
+**Promote to Tier 2 if r/devops is on the near-term calendar.**
+
+## Tier 5 — New protocol / orchestration (~2-4 weeks each)
+
+| # | Plan | Depends on | Surface |
+|---|---|---|---|
+| 8a | [2026-06-04-agent-client-protocol.md](../plans/2026-06-04-agent-client-protocol.md) (server side) | — | `gnoma acp` over stdio |
+| 8b | [2026-06-04-agent-client-protocol.md](../plans/2026-06-04-agent-client-protocol.md) (client side) | 8a | external ACP agents as router arms |
+| 9 | [2026-06-04-multi-agent-engineering-forge.md](../plans/2026-06-04-multi-agent-engineering-forge.md) | — | `internal/forge` state machine + `internal/sandbox` + 3 elfs |
+
+ACP is split into two PRs (server-side, then client-side) — the
+server-side drives editors (Zed, Kiro, OpenCode), the client-side
+consumes external ACP agents as router arms. Same wire protocol, two
+roles, two PRs.
+
+**Why ACP before MAEF:** MAEF has no hard dependency on ACP, but
+shipping ACP first means a future MAEF Critic can be an external ACP
+agent via `router.ForceArm` instead of being locked to a gnoma elf.
+**Flip to MAEF-first if MAEF is the next-release headline.**
+
+## Tier 6 — Older open plans (May)
+
+| Plan | Note |
+|---|---|
+| [2026-05-24-config-migration.md](../plans/2026-05-24-config-migration.md) | Phase 2+ (doctor already shipped in `f321dab`; project registry in `56d7217`). Follow-up plan is Tier 1 #1. |
+| [2026-05-24-sensitive-content-policy.md](../plans/2026-05-24-sensitive-content-policy.md) | Cross-cuts. Held until entropy-FP telemetry (Phase F-1) observed in production. |
+| [2026-05-25-encoder-bandit-router.md](../plans/2026-05-25-encoder-bandit-router.md) | Supersedes the open bandit-design question in TODO. Revisit when SLM dispatcher is in production. |
+| [2026-05-23-tool-router-specialization.md](../plans/2026-05-23-tool-router-specialization.md) | Telemetry-gated at 20% did-switch rate. May never ship. |
+
+## Shipped (carried for history)
+
+`2026-05-19-post-slm-unlock.md`, `2026-05-23-prefer-routing-policy.md`,
+`2026-05-23-routing-defaults-refresh.md`, `2026-05-23-startup-safety-banner.md`,
+`2026-05-19-security-wave1-safeprovider.md`, `2026-05-19-security-wave2-incognito.md`.
+
+## Sequencing rationale (the 3 push-back points)
+
+1. **models.dev before egress** — the plan is explicitly offline-first
+   (embedded snapshot is default). Ship the read-side plumbing first so
+   every later arm addition benefits from correct pricing/caps. Refresh
+   is a Phase 2 follow-up gated on Tier 3.
+2. **ACP before MAEF** — see Tier 5 note. Future-proofs the MAEF Critic
+   path. Flip if MAEF is the release headline.
+3. **TUI/UX before distribution** — these are parallelizable, so the
+   order between them is "whichever PR is ready first."
+
+## Decision points to revisit
+
+| Question | Effect |
+|---|---|
+| Is r/devops on the near-term calendar? | Promote cross-platform Phase 1 to Tier 2. |
+| Is MAEF the next-release headline? | Flip Tier 5 to MAEF-then-ACP. |
+| Will the SLM be running in production soon? | Promote encoder-bandit router to active. |
+
+## Open question for the maintainer
+
+Should the `docs/superpowers/specs/` directory become the home for
+**sequencing / cross-cutting** docs (this roadmap, future triage notes)
+while `plans/` stays per-feature? Currently `specs/` is empty.
@@ -3,27 +3,41 @@ package config
 import "time"

 // Config is the top-level configuration.
+//
+// Fields tagged with `,omitempty` are skipped by the encoder at
+// their Go zero value, which is what stops `gnoma config set` from
+// re-emitting zero-spam in fields the user never set. Fields where
+// the zero value can be a legitimate user choice (numeric / bool
+// where 0 / false is meaningful) are pointer types so nil (absent)
+// and *zero (explicit) are distinguishable at resolve time — see
+// Resolved() and ResolvedConfig in resolve.go.
 type Config struct {
 	// DefaultProfile names the profile loaded when no --profile flag is
 	// passed. Only meaningful when ~/.config/gnoma/profiles/ exists; see
 	// LoadWithProfile.
-	DefaultProfile string `toml:"default_profile"`
+	DefaultProfile string `toml:"default_profile,omitempty"`

-	Provider   ProviderSection   `toml:"provider"`
-	Permission PermissionSection `toml:"permission"`
-	Tools      ToolsSection      `toml:"tools"`
-	RateLimits RateLimitSection  `toml:"rate_limits"`
-	Security   SecuritySection   `toml:"security"`
-	Session    SessionSection    `toml:"session"`
-	SLM        SLMSection        `toml:"slm"`
-	Router     RouterSection     `toml:"router"`
-	Safety     SafetySection     `toml:"safety"`
-	CLIAgents  CLIAgentsSection  `toml:"cli_agents"`
-	Arms       []ArmConfig       `toml:"arms"`
-	Hooks      []HookConfig      `toml:"hooks"`
-	MCPServers []MCPServerConfig `toml:"mcp_servers"`
-	Plugins    PluginsSection    `toml:"plugins"`
-	TUI        TUISection        `toml:"tui"`
+	// Settings holds gnoma-level options that aren't tied to a
+	// specific section (provider, tools, etc.). Currently just the
+	// project-registry toggle; future home for log level, telemetry
+	// flags, etc.
+	Settings SettingsSection `toml:"config,omitempty"`
+
+	Provider   ProviderSection   `toml:"provider,omitempty"`
+	Permission PermissionSection `toml:"permission,omitempty"`
+	Tools      ToolsSection      `toml:"tools,omitempty"`
+	RateLimits RateLimitSection  `toml:"rate_limits,omitempty"`
+	Security   SecuritySection   `toml:"security,omitempty"`
+	Session    SessionSection    `toml:"session,omitempty"`
+	SLM        SLMSection        `toml:"slm,omitempty"`
+	Router     RouterSection     `toml:"router,omitempty"`
+	Safety     SafetySection     `toml:"safety,omitempty"`
+	CLIAgents  CLIAgentsSection  `toml:"cli_agents,omitempty"`
+	Arms       []ArmConfig       `toml:"arms,omitempty"`
+	Hooks      []HookConfig      `toml:"hooks,omitempty"`
+	MCPServers []MCPServerConfig `toml:"mcp_servers,omitempty"`
+	Plugins    PluginsSection    `toml:"plugins,omitempty"`
+	TUI        TUISection        `toml:"tui,omitempty"`
 }

 // SLMSection configures the optional small language model used for task
@@ -40,14 +54,36 @@ type Config struct {
 //
 // See docs/slm-backends.md for copy-paste presets.
 type SLMSection struct {
-	Enabled        bool     `toml:"enabled"`
-	Backend        string   `toml:"backend"`         // auto | ollama | llamacpp | llamafile | openaicompat | disabled (empty = auto)
-	Model          string   `toml:"model"`           // model name (ollama/llamacpp/openaicompat); ignored for llamafile
-	BaseURL        string   `toml:"base_url"`        // server URL; defaults per-backend
-	ModelURL       string   `toml:"model_url"`       // llamafile-only: where to download the binary from
-	DataDir        string   `toml:"data_dir"`        // llamafile-only: where to put it (empty = XDG default)
-	ExpectedSHA256 string   `toml:"expected_sha256"` // llamafile-only: verify hash if non-empty
-	StartupTimeout Duration `toml:"startup_timeout"` // llamafile-only: first-launch wait budget; 0 = default 5s
+	Enabled        bool      `toml:"enabled,omitempty"`
+	Backend        string    `toml:"backend,omitempty"`         // auto | ollama | llamacpp | llamafile | openaicompat | disabled (empty = auto)
+	Model          string    `toml:"model,omitempty"`           // model name (ollama/llamacpp/openaicompat); ignored for llamafile
+	BaseURL        string    `toml:"base_url,omitempty"`        // server URL; defaults per-backend
+	ModelURL       string    `toml:"model_url,omitempty"`       // llamafile-only: where to download the binary from
+	DataDir        string    `toml:"data_dir,omitempty"`        // llamafile-only: where to put it (empty = XDG default)
+	ExpectedSHA256 string    `toml:"expected_sha256,omitempty"` // llamafile-only: verify hash if non-empty
+	StartupTimeout *Duration `toml:"startup_timeout,omitempty"` // llamafile-only: first-launch wait budget; nil = default 5s
+
+	// ClassifyTimeout caps each task-classification call to the SLM.
+	// nil here means "use the built-in default" (15s). *Duration(0) is
+	// explicit-zero and also resolves to 0 (the SLM layer treats 0
+	// the same as nil via internal/slm/classifier.go). Pointer
+	// conversion was added in the 2026-06-04 follow-up so the encoder
+	// can honor omitempty — see plan file referenced in resolve.go.
+	ClassifyTimeout *Duration `toml:"classify_timeout,omitempty"`
+
+	// RegisterAsArm controls whether the SLM model is registered as
+	// a tier-0 execution arm in addition to its classifier role.
+	// nil (absent) → true (preserve historical behaviour: SLM is
+	// both classifier and an execution arm for trivial-complexity
+	// prompts). Explicitly false → SLM is classifier-only; trivial
+	// prompts route to other local arms instead.
+	//
+	// Set this to false when the SLM model is task-specialised
+	// (FunctionGemma, embedding-only models, code-completion-tuned
+	// models) and would produce wrong-shape output if asked to
+	// answer a general prompt. Pointer type so the absent-value
+	// case can be distinguished from explicit false.
+	RegisterAsArm *bool `toml:"register_as_arm,omitempty"`
 }

 // ArmConfig tunes routing for a single registered arm. Multiple [[arms]]
@@ -69,9 +105,9 @@ type SLMSection struct {
 // Strength names map to router.TaskType via router.ParseTaskType — same
 // names the SLM classifier emits (snake_case or no separator both work).
 type ArmConfig struct {
-	ID         string   `toml:"id"`
-	Strengths  []string `toml:"strengths"`
-	CostWeight float64  `toml:"cost_weight"`
+	ID         string   `toml:"id,omitempty"`
+	Strengths  []string `toml:"strengths,omitempty"`
+	CostWeight float64  `toml:"cost_weight,omitempty"`
 }

 // CLIAgentsSection maps canonical CLI agent names to override binary names.
@@ -103,15 +139,15 @@ type SafetySection struct {
 	// RefuseInSystemDirs gates the refuse path. When false, system
 	// roots like / and /etc are treated as warn-tier instead of refuse.
 	// Default: true.
-	RefuseInSystemDirs *bool `toml:"refuse_in_system_dirs"`
+	RefuseInSystemDirs *bool `toml:"refuse_in_system_dirs,omitempty"`
 	// WarnInHome gates the warn-tier check for $HOME and common
 	// dumping grounds (~/Desktop, ~/Downloads, /tmp). When false,
 	// these all become OK-tier (banner still shown). Default: true.
-	WarnInHome *bool `toml:"warn_in_home"`
+	WarnInHome *bool `toml:"warn_in_home,omitempty"`
 	// RequireProjectMarker, when true, treats any directory without
 	// a recognized project marker as warn-tier (even inside a git
 	// repo). Default: false — git repo is enough by default.
-	RequireProjectMarker bool `toml:"require_project_marker"`
+	RequireProjectMarker bool `toml:"require_project_marker,omitempty"`
 }

 // ResolvedSafety returns the effective Safety settings with defaults
@@ -148,7 +184,11 @@ type RouterSection struct {
 	// arm context window. Useful for debugging or for forcing the behavior
 	// on a large local model. Defaults to false: two-stage activates
 	// automatically on local arms with context window <= 16k.
-	ForceTwoStage bool `toml:"force_two_stage"`
+	//
+	// Pointer so the absent-vs-explicit-false distinction is preserved
+	// across write/read cycles; the resolver substitutes the default
+	// (false) for nil. See ResolvedRouterSection in resolve.go.
+	ForceTwoStage *bool `toml:"force_two_stage,omitempty"`

 	// Prefer biases routing toward local arms ("local"), cloud arms
 	// ("cloud"), or leaves the tier-based selection unchanged ("auto").
@@ -156,12 +196,12 @@ type RouterSection struct {
 	// not hard-filter the dispreferred set. Forced arms (--provider X)
 	// and incognito take priority over this knob. See
 	// docs/superpowers/plans/2026-05-23-prefer-routing-policy.md.
-	Prefer string `toml:"prefer"`
+	Prefer string `toml:"prefer,omitempty"`

 	// Bandit exposes the selector's tuning knobs. Defaults preserve
 	// previous hard-coded behaviour exactly; only set these when you
 	// need to tune the EMA quality tracker for an unusual workload.
-	Bandit BanditSection `toml:"bandit"`
+	Bandit BanditSection `toml:"bandit,omitempty"`
 }

 // BanditSection holds the scoring knobs for the EMA quality tracker
@@ -174,23 +214,44 @@ type BanditSection struct {
 	// QualityAlpha is the EMA smoothing factor for arm-quality
 	// observations. Larger values weight recent observations more.
 	// Default: 0.3 (~3-sample memory). 0.0 here means "use default".
-	QualityAlpha float64 `toml:"quality_alpha"`
+	QualityAlpha float64 `toml:"quality_alpha,omitempty"`

 	// MinObservations is the minimum number of samples required
 	// before observed EMA overrides the heuristic fallback. Default:
 	// 3. 0 here means "use default".
-	MinObservations int `toml:"min_observations"`
+	MinObservations int `toml:"min_observations,omitempty"`

 	// ObservedWeight is the weight of the observed EMA in the
 	// observed/heuristic blend inside scoreArm: the final quality is
 	// `observed*W + heuristic*(1-W)`. Default: 0.7. 0.0 here means
 	// "use default".
-	ObservedWeight float64 `toml:"observed_weight"`
+	ObservedWeight float64 `toml:"observed_weight,omitempty"`

 	// StrengthBonus is the quality bonus added when an arm declares
 	// the current task type in its Strengths list. Default: 0.15.
 	// 0.0 here means "use default".
-	StrengthBonus float64 `toml:"strength_bonus"`
+	StrengthBonus float64 `toml:"strength_bonus,omitempty"`
+}
+
+// SettingsSection holds gnoma-level options that aren't tied to
+// a specific functional section (provider, tools, etc.). Lives
+// under `[config]` in the user's TOML file. Current fields:
+//
+//   - ProjectRegistry: opt out of the ~/.config/gnoma/projects.json
+//     write. nil = enabled (default true; preserves v0.3.x
+//     behavior of always recording); *false = opt out.
+//
+// The file itself is purely local — never sent off-machine —
+// see README §Security. The toggle exists for users who don't
+// want the directory log kept at all.
+type SettingsSection struct {
+	// ProjectRegistry controls whether gnoma writes to
+	// ~/.config/gnoma/projects.json (the per-user list of
+	// directories gnoma has been launched in, used by
+	// `gnoma doctor --all-projects`, `gnoma upgrade-config --all`,
+	// and the cross-project session picker). nil = enabled
+	// (default true); *false = opt out.
+	ProjectRegistry *bool `toml:"project_registry,omitempty"`
 }

 // MCPServerConfig defines an MCP server to start and connect to.
@@ -205,17 +266,17 @@ type BanditSection struct {
 //	timeout = "30s"
 //	replace_default = { exec = "bash" }  # MCP tool "exec" replaces built-in "bash"
 type MCPServerConfig struct {
-	Name           string                   `toml:"name"`
-	Command        string                   `toml:"command"`
-	Args           []string                 `toml:"args"`
-	Env            map[string]string        `toml:"env"`
-	Timeout        string                   `toml:"timeout"`
-	ReplaceDefault map[string]string        `toml:"replace_default"` // MCP tool name → built-in name
-	ToolPolicy     map[string]MCPToolPolicy `toml:"tool_policy"`     // MCP tool name → policy
+	Name           string                   `toml:"name,omitempty"`
+	Command        string                   `toml:"command,omitempty"`
+	Args           []string                 `toml:"args,omitempty"`
+	Env            map[string]string        `toml:"env,omitempty"`
+	Timeout        string                   `toml:"timeout,omitempty"`
+	ReplaceDefault map[string]string        `toml:"replace_default,omitempty"` // MCP tool name → built-in name
+	ToolPolicy     map[string]MCPToolPolicy `toml:"tool_policy,omitempty"`     // MCP tool name → policy
 }

 type MCPToolPolicy struct {
-	PathArgs []string `toml:"path_args"`
+	PathArgs []string `toml:"path_args,omitempty"`
 }

 // PluginsSection controls plugin loading.
@@ -226,8 +287,8 @@ type MCPToolPolicy struct {
 //	enabled = ["git-tools", "docker-tools"]
 //	disabled = ["experimental-plugin"]
 type PluginsSection struct {
-	Enabled  []string `toml:"enabled"`
-	Disabled []string `toml:"disabled"`
+	Enabled  []string `toml:"enabled,omitempty"`
+	Disabled []string `toml:"disabled,omitempty"`
 }

 // HookConfig is a single hook entry from TOML config.
@@ -243,17 +304,22 @@ type PluginsSection struct {
 //	timeout = "10s"
 //	fail_open = false
 type HookConfig struct {
-	Name        string `toml:"name"`
-	Event       string `toml:"event"`
-	Type        string `toml:"type"`
-	Exec        string `toml:"exec"`
-	Timeout     string `toml:"timeout"`
-	FailOpen    bool   `toml:"fail_open"`
-	ToolPattern string `toml:"tool_pattern"`
+	Name        string `toml:"name,omitempty"`
+	Event       string `toml:"event,omitempty"`
+	Type        string `toml:"type,omitempty"`
+	Exec        string `toml:"exec,omitempty"`
+	Timeout     string `toml:"timeout,omitempty"`
+	FailOpen    *bool  `toml:"fail_open,omitempty"`
+	ToolPattern string `toml:"tool_pattern,omitempty"`
 }

 type SessionSection struct {
-	MaxKeep int `toml:"max_keep"`
+	// MaxKeep is the maximum number of sessions to retain. nil = use
+	// default (20); *0 = explicitly disable session retention.
+	// Pointer type so the absent-vs-explicit-zero distinction is
+	// preserved across write/read cycles; the resolver substitutes
+	// the default for nil. See ResolvedSessionSection in resolve.go.
+	MaxKeep *int `toml:"max_keep,omitempty"`
 }

 // SecuritySection configures the secret scanner and firewall.
@@ -272,41 +338,53 @@ type SessionSection struct {
 // entropy_safelist names known-safe shapes that bypass the entropy scorer
 // (Phase F-1 FP reduction). Empty / unset preserves pre-F-1 behavior.
 type SecuritySection struct {
-	EntropyThreshold  float64         `toml:"entropy_threshold"`
-	RedactHighEntropy bool            `toml:"redact_high_entropy"`
-	EntropySafelist   []string        `toml:"entropy_safelist"`
-	Patterns          []PatternConfig `toml:"patterns"`
+	// EntropyThreshold is the Shannon-entropy floor above which a
+	// token is treated as a possible secret. nil = use the built-in
+	// default (4.5); *0 disables the entropy pre-filter entirely.
+	// Pointer type so the absent-vs-explicit-zero distinction is
+	// preserved across write/read cycles; the resolver substitutes
+	// the default for nil. See ResolvedSecuritySection in resolve.go.
+	EntropyThreshold *float64 `toml:"entropy_threshold,omitempty"`
+
+	// RedactHighEntropy controls whether high-entropy hits are
+	// redacted in outgoing LLM traffic. nil = false (warn / block
+	// only); *true enables redaction. Pointer type so the absent-
+	// vs-explicit-false distinction is preserved.
+	RedactHighEntropy *bool `toml:"redact_high_entropy,omitempty"`
+
+	EntropySafelist []string        `toml:"entropy_safelist,omitempty"`
+	Patterns        []PatternConfig `toml:"patterns,omitempty"`
 }

 type PatternConfig struct {
-	Name   string `toml:"name"`
-	Regex  string `toml:"regex"`
-	Action string `toml:"action"` // "redact" (default), "block", "warn"
+	Name   string `toml:"name,omitempty"`
+	Regex  string `toml:"regex,omitempty"`
+	Action string `toml:"action,omitempty"` // "redact" (default), "block", "warn"
 }

 type PermissionSection struct {
-	Mode  string           `toml:"mode"`
-	Rules []PermissionRule `toml:"rules"`
+	Mode  string           `toml:"mode,omitempty"`
+	Rules []PermissionRule `toml:"rules,omitempty"`
 }

 type PermissionRule struct {
-	Tool    string `toml:"tool"`
-	Pattern string `toml:"pattern"`
-	Action  string `toml:"action"`
+	Tool    string `toml:"tool,omitempty"`
+	Pattern string `toml:"pattern,omitempty"`
+	Action  string `toml:"action,omitempty"`
 }

 type ProviderSection struct {
-	Default     string            `toml:"default"`
-	Model       string            `toml:"model"`
-	MaxTokens   int64             `toml:"max_tokens"`
-	Temperature *float64          `toml:"temperature"`
-	APIKeys     map[string]string `toml:"api_keys"`
-	Endpoints   map[string]string `toml:"endpoints"`
+	Default     string            `toml:"default,omitempty"`
+	Model       string            `toml:"model,omitempty"`
+	MaxTokens   *int64            `toml:"max_tokens,omitempty"`
+	Temperature *float64          `toml:"temperature,omitempty"`
+	APIKeys     map[string]string `toml:"api_keys,omitempty"`
+	Endpoints   map[string]string `toml:"endpoints,omitempty"`
 }

 type ToolsSection struct {
-	BashTimeout Duration `toml:"bash_timeout"`
-	MaxFileSize int64    `toml:"max_file_size"`
+	BashTimeout Duration `toml:"bash_timeout,omitempty"`
+	MaxFileSize *int64   `toml:"max_file_size,omitempty"`
 }

 // RateLimitSection allows overriding default rate limits per provider.
@@ -326,15 +404,15 @@ type ToolsSection struct {
 type RateLimitSection map[string]RateLimitOverride

 type RateLimitOverride struct {
-	Tier        string  `toml:"tier"`
-	RPS         float64 `toml:"rps"`
-	RPM         int     `toml:"rpm"`
-	RPD         int     `toml:"rpd"`
-	TPM         int     `toml:"tpm"`
-	ITPM        int     `toml:"itpm"`
-	OTPM        int     `toml:"otpm"`
-	TokensMonth int64   `toml:"tokens_month"`
-	SpendCap    float64 `toml:"spend_cap"`
+	Tier        string  `toml:"tier,omitempty"`
+	RPS         float64 `toml:"rps,omitempty"`
+	RPM         int     `toml:"rpm,omitempty"`
+	RPD         int     `toml:"rpd,omitempty"`
+	TPM         int     `toml:"tpm,omitempty"`
+	ITPM        int     `toml:"itpm,omitempty"`
+	OTPM        int     `toml:"otpm,omitempty"`
+	TokensMonth int64   `toml:"tokens_month,omitempty"`
+	SpendCap    float64 `toml:"spend_cap,omitempty"`
 }

 // Duration wraps time.Duration for TOML string parsing (e.g. "30s", "5m").
@@ -354,6 +432,6 @@ func (d Duration) Duration() time.Duration {
 }

 type TUISection struct {
-	Theme string `toml:"theme"`
-	Vim   bool   `toml:"vim"`
+	Theme string `toml:"theme,omitempty"`
+	Vim   bool   `toml:"vim,omitempty"`
 }
@@ -5,6 +5,8 @@ import (
 	"path/filepath"
 	"testing"
 	"time"
+
+	"github.com/BurntSushi/toml"
 )

 func TestDefaults(t *testing.T) {
@@ -12,8 +14,8 @@ func TestDefaults(t *testing.T) {
 	if cfg.Provider.Default != "" {
 		t.Errorf("Provider.Default = %q, want empty (no default provider)", cfg.Provider.Default)
 	}
-	if cfg.Provider.MaxTokens != 8192 {
-		t.Errorf("Provider.MaxTokens = %d", cfg.Provider.MaxTokens)
+	if cfg.Provider.MaxTokens == nil || *cfg.Provider.MaxTokens != 8192 {
+		t.Errorf("Provider.MaxTokens = %v, want *8192", cfg.Provider.MaxTokens)
 	}
 	if cfg.Tools.BashTimeout.Duration() != 30*time.Second {
 		t.Errorf("Tools.BashTimeout = %v", cfg.Tools.BashTimeout)
@@ -53,8 +55,8 @@ max_file_size = 2097152
 	if cfg.Provider.Model != "claude-sonnet-4" {
 		t.Errorf("Provider.Model = %q", cfg.Provider.Model)
 	}
-	if cfg.Provider.MaxTokens != 16384 {
-		t.Errorf("Provider.MaxTokens = %d", cfg.Provider.MaxTokens)
+	if cfg.Provider.MaxTokens == nil || *cfg.Provider.MaxTokens != 16384 {
+		t.Errorf("Provider.MaxTokens = %v, want *16384", cfg.Provider.MaxTokens)
 	}
 	if cfg.Provider.APIKeys["anthropic"] != "sk-test-123" {
 		t.Errorf("APIKeys[anthropic] = %q", cfg.Provider.APIKeys["anthropic"])
@@ -65,8 +67,8 @@ max_file_size = 2097152
 	if cfg.Tools.BashTimeout.Duration() != 60*time.Second {
 		t.Errorf("Tools.BashTimeout = %v", cfg.Tools.BashTimeout)
 	}
-	if cfg.Tools.MaxFileSize != 2097152 {
-		t.Errorf("Tools.MaxFileSize = %d", cfg.Tools.MaxFileSize)
+	if cfg.Tools.MaxFileSize == nil || *cfg.Tools.MaxFileSize != 2097152 {
+		t.Errorf("Tools.MaxFileSize = %v, want *2097152", cfg.Tools.MaxFileSize)
 	}
 }

@@ -217,7 +219,7 @@ tool_pattern = "bash*"
 	if h.Timeout != "5s" {
 		t.Errorf("Timeout = %q", h.Timeout)
 	}
-	if !h.FailOpen {
+	if h.FailOpen == nil || !*h.FailOpen {
 		t.Error("FailOpen should be true")
 	}
 	if h.ToolPattern != "bash*" {
@@ -444,7 +446,54 @@ model = "claude-haiku"
 		t.Errorf("Model = %q, want claude-haiku (from project)", cfg.Provider.Model)
 	}
 	// Global: max_tokens = 4096
-	if cfg.Provider.MaxTokens != 4096 {
-		t.Errorf("MaxTokens = %d, want 4096 (from global)", cfg.Provider.MaxTokens)
+	if cfg.Provider.MaxTokens == nil || *cfg.Provider.MaxTokens != 4096 {
+		t.Errorf("MaxTokens = %v, want *4096 (from global)", cfg.Provider.MaxTokens)
+	}
+}
+
+func TestSLMSection_RegisterAsArm_AbsentDefaultsToTrue(t *testing.T) {
+	// Absent field → nil pointer → caller treats as default true,
+	// preserving pre-config behaviour where the SLM is always
+	// registered as an execution arm.
+	var cfg Config
+	if _, err := toml.Decode(`[slm]
+enabled = true
+`, &cfg); err != nil {
+		t.Fatalf("decode: %v", err)
+	}
+	if cfg.SLM.RegisterAsArm != nil {
+		t.Errorf("expected nil pointer for absent register_as_arm, got %v", *cfg.SLM.RegisterAsArm)
+	}
+}
+
+func TestSLMSection_RegisterAsArm_ExplicitFalse(t *testing.T) {
+	var cfg Config
+	if _, err := toml.Decode(`[slm]
+enabled = true
+register_as_arm = false
+`, &cfg); err != nil {
+		t.Fatalf("decode: %v", err)
+	}
+	if cfg.SLM.RegisterAsArm == nil {
+		t.Fatal("expected non-nil pointer when register_as_arm is set")
+	}
+	if *cfg.SLM.RegisterAsArm {
+		t.Errorf("expected register_as_arm=false to decode as *false, got *true")
+	}
+}
+
+func TestSLMSection_RegisterAsArm_ExplicitTrue(t *testing.T) {
+	var cfg Config
+	if _, err := toml.Decode(`[slm]
+enabled = true
+register_as_arm = true
+`, &cfg); err != nil {
+		t.Fatalf("decode: %v", err)
+	}
+	if cfg.SLM.RegisterAsArm == nil {
+		t.Fatal("expected non-nil pointer when register_as_arm is set")
+	}
+	if !*cfg.SLM.RegisterAsArm {
+		t.Errorf("expected register_as_arm=true to decode as *true, got *false")
 	}
 }
@@ -3,11 +3,24 @@ package config
 import "time"

 func Defaults() Config {
+	maxTokens := int64(8192)
+	maxFileSize := int64(1 << 20) // 1MB
+	maxKeep := 20
+	entropyThreshold := 4.5
+	redactHighEntropy := false
+	forceTwoStage := false
+	startupTimeout := Duration(5 * time.Second)
+	classifyTimeout := Duration(0) // 0 = let the SLM layer pick its own 15s default
+	projectRegistry := true
+
 	return Config{
+		Settings: SettingsSection{
+			ProjectRegistry: &projectRegistry,
+		},
 		Provider: ProviderSection{
 			Default:   "",
 			Model:     "",
-			MaxTokens: 8192,
+			MaxTokens: &maxTokens,
 			APIKeys:   make(map[string]string),
 			Endpoints: make(map[string]string),
 		},
@@ -16,11 +29,19 @@ func Defaults() Config {
 		},
 		Tools: ToolsSection{
 			BashTimeout: Duration(30 * time.Second),
-			MaxFileSize: 1 << 20, // 1MB
+			MaxFileSize: &maxFileSize,
+		},
+		Session: SessionSection{MaxKeep: &maxKeep},
+		Security: SecuritySection{
+			EntropyThreshold:  &entropyThreshold,
+			RedactHighEntropy: &redactHighEntropy,
+		},
+		Router: RouterSection{
+			ForceTwoStage: &forceTwoStage,
 		},
-		Session: SessionSection{MaxKeep: 20},
 		SLM: SLMSection{
-			StartupTimeout: Duration(5 * time.Second),
+			StartupTimeout:  &startupTimeout,
+			ClassifyTimeout: &classifyTimeout,
 		},
 		TUI: TUISection{
 			Theme: "catppuccin",
@@ -0,0 +1,431 @@
+package config
+
+import (
+	"fmt"
+	"os"
+	"sort"
+	"strings"
+
+	"github.com/BurntSushi/toml"
+)
+
+// Severity ranks diagnostic findings for the CLI output and
+// exit-code decision. Higher numeric value = more severe.
+type Severity int
+
+const (
+	// SeverityInfo is a neutral observation (e.g. "field is at
+	// the default value, can be removed"). Never causes a
+	// non-zero exit on its own.
+	SeverityInfo Severity = iota
+
+	// SeverityWarn indicates a likely problem the user should
+	// review (e.g. an invalid enum value, an explicit-zero
+	// pointer field that diverges from the default). Causes
+	// a non-zero exit in CLI mode by default.
+	SeverityWarn
+
+	// SeverityError indicates a hard failure (file unreadable,
+	// file unparseable). Causes a non-zero exit.
+	SeverityError
+)
+
+// String returns the lower-case name of the severity for
+// human-readable output.
+func (s Severity) String() string {
+	switch s {
+	case SeverityInfo:
+		return "info"
+	case SeverityWarn:
+		return "warn"
+	case SeverityError:
+		return "error"
+	default:
+		return "?"
+	}
+}
+
+// MarshalJSON encodes Severity as its lower-case name string
+// (e.g. "warn", "error") for stable CI/script consumption.
+// The default Go marshaling would emit the int value, which
+// is opaque to consumers.
+func (s Severity) MarshalJSON() ([]byte, error) {
+	return []byte(`"` + s.String() + `"`), nil
+}
+
+// Finding is one diagnostic result. The CLI renders these
+// either as human-readable text or as JSON (--json flag).
+type Finding struct {
+	Severity   Severity `json:"severity"`
+	Path       string   `json:"path"`
+	Key        string   `json:"key,omitempty"`
+	Message    string   `json:"message"`
+	Suggestion string   `json:"suggestion,omitempty"`
+}
+
+// Doctor runs diagnostic checks on config files. Constructed
+// with NewDoctor; reusable across many files. Stateless after
+// construction — set Defaults to override the comparison
+// baseline (used in tests; production always uses Defaults()).
+type Doctor struct {
+	// Defaults is the baseline for "is this field at the
+	// default value" checks. If nil, Defaults() is used.
+	Defaults *Config
+}
+
+// NewDoctor returns a Doctor with the production defaults
+// baseline.
+func NewDoctor() *Doctor {
+	return &Doctor{Defaults: nil}
+}
+
+// DiagnoseFile runs the full diagnostic suite on a single
+// config file. The returned slice may be empty (file is
+// clean) or contain findings of any severity.
+func (d *Doctor) DiagnoseFile(path string) []Finding {
+	data, err := os.ReadFile(path)
+	if err != nil {
+		return []Finding{{
+			Severity: SeverityError,
+			Path:     path,
+			Message:  fmt.Sprintf("read: %v", err),
+		}}
+	}
+
+	var cfg Config
+	meta, err := toml.Decode(string(data), &cfg)
+	if err != nil {
+		return []Finding{{
+			Severity: SeverityError,
+			Path:     path,
+			Message:  fmt.Sprintf("parse: %v", err),
+		}}
+	}
+
+	defaults := d.Defaults
+	if defaults == nil {
+		def := Defaults()
+		defaults = &def
+	}
+
+	var findings []Finding
+	findings = append(findings, d.detectUnknownKeys(path, meta)...)
+	findings = append(findings, d.detectInvalidEnums(path, &cfg)...)
+	findings = append(findings, d.detectExplicitZeros(path, &cfg, defaults)...)
+	return findings
+}
+
+// DiagnoseFiles runs DiagnoseFile on each path in turn and
+// returns the concatenated findings. The order is the input
+// order; callers that want deterministic output should sort
+// their input list first.
+func (d *Doctor) DiagnoseFiles(paths []string) []Finding {
+	var findings []Finding
+	for _, p := range paths {
+		findings = append(findings, d.DiagnoseFile(p)...)
+	}
+	// Stable order for diff-friendly CI output.
+	sort.SliceStable(findings, func(i, j int) bool {
+		if findings[i].Path != findings[j].Path {
+			return findings[i].Path < findings[j].Path
+		}
+		if findings[i].Severity != findings[j].Severity {
+			return findings[i].Severity > findings[j].Severity
+		}
+		return findings[i].Key < findings[j].Key
+	})
+	return findings
+}
+
+// DiagnoseLayering compares the resolved views of two config
+// files (typically the global config and a project config)
+// and surfaces "shadowing" findings: cases where the project
+// file's value differs from the global's, and the project's
+// value is at the Go zero (string `""`, int 0, bool false).
+//
+// The original 2026-05-24 silent-corruption bug was exactly
+// this pattern: the project file had `[router] prefer = ""`,
+// silently shadowing the global's `prefer = "cloud"` because
+// TOML's "present field wins" semantics treat `""` as a
+// legitimate value rather than "absent". The doctor catches
+// it without needing the user to read the merge logic.
+//
+// Returns an empty slice if either file is missing (the
+// per-file `DiagnoseFile` already reports missing files; a
+// layering check without both sides has nothing to compare).
+func (d *Doctor) DiagnoseLayering(globalPath, projectPath string) []Finding {
+	if _, err := os.Stat(globalPath); os.IsNotExist(err) {
+		return nil
+	}
+	if _, err := os.Stat(projectPath); os.IsNotExist(err) {
+		return nil
+	}
+
+	var globalCfg, projectCfg Config
+	if _, err := toml.DecodeFile(globalPath, &globalCfg); err != nil {
+		return nil
+	}
+	if _, err := toml.DecodeFile(projectPath, &projectCfg); err != nil {
+		return nil
+	}
+
+	// For non-pointer string fields we need to know whether
+	// the key was actually present in the project's source —
+	// an absent key and a present-empty key look identical in
+	// the typed Config. Parse the project to a raw map for
+	// per-key presence checks.
+	var projectRaw map[string]any
+	if _, err := toml.DecodeFile(projectPath, &projectRaw); err != nil {
+		projectRaw = nil
+	}
+	hasKey := func(section, key string) bool {
+		if projectRaw == nil {
+			return false
+		}
+		sec, ok := projectRaw[section].(map[string]any)
+		if !ok {
+			return false
+		}
+		_, present := sec[key]
+		return present
+	}
+
+	defaults := d.Defaults
+	if defaults == nil {
+		def := Defaults()
+		defaults = &def
+	}
+	defRes := defaults.Resolved()
+
+	var findings []Finding
+
+	// Non-pointer string fields. Project's value is in the
+	// source AND is the empty string AND global's value is a
+	// user-set non-default non-empty string → shadowing. (If
+	// the project key is absent, the field inherits — no
+	// shadowing. If global is also empty, both inherit the
+	// default — no shadowing.)
+	type stringField struct {
+		key, projectVal, globalVal string
+	}
+	stringFields := []stringField{
+		{"router.prefer", projectCfg.Router.Prefer, globalCfg.Router.Prefer},
+		{"permission.mode", projectCfg.Permission.Mode, globalCfg.Permission.Mode},
+		{"provider.default", projectCfg.Provider.Default, globalCfg.Provider.Default},
+		{"provider.model", projectCfg.Provider.Model, globalCfg.Provider.Model},
+	}
+	for _, f := range stringFields {
+		// Parse the key to section/field. The format is
+		// "section.field" — split on the first dot.
+		section, field, _ := strings.Cut(f.key, ".")
+		if !hasKey(section, field) {
+			continue
+		}
+		if f.projectVal != "" {
+			continue
+		}
+		if f.globalVal == "" || f.globalVal == defaultStringFor(f.key) {
+			continue
+		}
+		findings = append(findings, Finding{
+			Severity: SeverityWarn,
+			Path:     projectPath,
+			Key:      f.key,
+			Message: fmt.Sprintf(
+				"project's %s=%q shadows global's %s=%q; the merged value is %q, not the user's global intent",
+				f.key, f.projectVal, f.key, f.globalVal, f.projectVal),
+			Suggestion: "delete the line in the project config to inherit the global value, or set an explicit non-empty value",
+		})
+	}
+
+	// Pointer-converted numeric fields. Project has *0
+	// (explicit zero) when global has a non-default value
+	// → shadowing. (The "is zero" check is on the raw pointer,
+	// not the resolved value, because nil and *0 are different:
+	// nil means "absent" — inherit global — and *0 means
+	// "explicit zero" — override global. The latter is the
+	// bug case.)
+	if projectCfg.Provider.MaxTokens != nil && *projectCfg.Provider.MaxTokens == 0 &&
+		globalCfg.Provider.MaxTokens != nil && *globalCfg.Provider.MaxTokens != defRes.Provider.MaxTokens {
+		findings = append(findings, Finding{
+			Severity: SeverityWarn,
+			Path:     projectPath,
+			Key:      "provider.max_tokens",
+			Message: fmt.Sprintf(
+				"project's provider.max_tokens=0 shadows global's provider.max_tokens=%d",
+				*globalCfg.Provider.MaxTokens),
+			Suggestion: "delete the line to inherit the global value, or set an explicit non-zero value",
+		})
+	}
+
+	return findings
+}
+
+// defaultStringFor returns the documented default value for a
+// given non-pointer string config key. Used by the layering
+// check to distinguish "global is at the default" (no
+// shadowing, nothing to do) from "global has a user-set
+// value" (which the project might shadow).
+func defaultStringFor(key string) string {
+	switch key {
+	case "router.prefer":
+		return "" // prefer defaults to "auto" but resolves to ""
+	case "permission.mode":
+		return "auto"
+	case "provider.default":
+		return ""
+	case "provider.model":
+		return ""
+	}
+	return ""
+}
+
+// detectUnknownKeys surfaces top-level keys in the source that
+// don't map to any Config field. Decoder ignores them silently
+// today; doctor flags them so the user can clean up typos
+// like `[provdier]` or removed-schema leftovers.
+func (d *Doctor) detectUnknownKeys(path string, meta toml.MetaData) []Finding {
+	var findings []Finding
+	for _, k := range meta.Undecoded() {
+		findings = append(findings, Finding{
+			Severity:   SeverityWarn,
+			Path:       path,
+			Key:        k.String(),
+			Message:    fmt.Sprintf("unknown top-level key %q (not in the current Config schema)", k.String()),
+			Suggestion: "remove the section or rename to a known key",
+		})
+	}
+	return findings
+}
+
+// detectInvalidEnums checks enum-typed string fields against
+// their parsers. The current set is intentionally small —
+// only fields with a documented value space and a parser
+// function. Add more as the surface grows.
+func (d *Doctor) detectInvalidEnums(path string, cfg *Config) []Finding {
+	var findings []Finding
+
+	// permission.mode — must be a permission.Mode constant.
+	if cfg.Permission.Mode != "" && !validPermissionMode(cfg.Permission.Mode) {
+		findings = append(findings, Finding{
+			Severity:   SeverityWarn,
+			Path:       path,
+			Key:        "permission.mode",
+			Message:    fmt.Sprintf("invalid permission.mode %q (expected one of: default, accept_edits, bypass, deny, plan, auto)", cfg.Permission.Mode),
+			Suggestion: "fix the value, or remove the line to use the default",
+		})
+	}
+
+	// router.prefer — must parse via router.ParsePreferPolicy.
+	// (That parser accepts "" and "auto" as valid, so we skip
+	// the check on those.)
+	if cfg.Router.Prefer != "" && cfg.Router.Prefer != "auto" &&
+		!validRouterPrefer(cfg.Router.Prefer) {
+		findings = append(findings, Finding{
+			Severity:   SeverityWarn,
+			Path:       path,
+			Key:        "router.prefer",
+			Message:    fmt.Sprintf("invalid router.prefer %q (expected \"local\", \"cloud\", or \"auto\")", cfg.Router.Prefer),
+			Suggestion: "fix the value, or remove the line to use the default",
+		})
+	}
+
+	// slm.backend — must be a recognized backend.
+	if cfg.SLM.Backend != "" && !validSLMBackend(cfg.SLM.Backend) {
+		findings = append(findings, Finding{
+			Severity:   SeverityWarn,
+			Path:       path,
+			Key:        "slm.backend",
+			Message:    fmt.Sprintf("invalid slm.backend %q (expected auto, ollama, llamacpp, llamafile, openaicompat, or disabled)", cfg.SLM.Backend),
+			Suggestion: "fix the value, or remove the line to use the default",
+		})
+	}
+
+	return findings
+}
+
+// detectExplicitZeros surfaces pointer-converted fields whose
+// value is *zero (the user explicitly wrote a zero in the
+// file) and the default's resolved value is non-zero. These
+// are the cases where the user might have a typo (e.g.
+// `max_tokens = 0` when they meant 8192) or an explicit
+// override. The upgrade-config preserves them as user
+// intent; the doctor surfaces them for review.
+func (d *Doctor) detectExplicitZeros(path string, cfg *Config, defaults *Config) []Finding {
+	var findings []Finding
+
+	resolved := cfg.Resolved()
+	defaultsResolved := defaults.Resolved()
+
+	// Provider.MaxTokens
+	if cfg.Provider.MaxTokens != nil && *cfg.Provider.MaxTokens == 0 && resolved.Provider.MaxTokens != defaultsResolved.Provider.MaxTokens {
+		findings = append(findings, Finding{
+			Severity: SeverityWarn,
+			Path:     path,
+			Key:      "provider.max_tokens",
+			Message:  fmt.Sprintf("explicit zero for provider.max_tokens (resolved to %d); the default is %d. Is this intentional?", resolved.Provider.MaxTokens, defaultsResolved.Provider.MaxTokens),
+		})
+	}
+
+	// Tools.MaxFileSize
+	if cfg.Tools.MaxFileSize != nil && *cfg.Tools.MaxFileSize == 0 && resolved.Tools.MaxFileSize != defaultsResolved.Tools.MaxFileSize {
+		findings = append(findings, Finding{
+			Severity: SeverityWarn,
+			Path:     path,
+			Key:      "tools.max_file_size",
+			Message:  fmt.Sprintf("explicit zero for tools.max_file_size (resolved to %d); the default is %d. Zero disables the size cap.", resolved.Tools.MaxFileSize, defaultsResolved.Tools.MaxFileSize),
+		})
+	}
+
+	// Session.MaxKeep
+	if cfg.Session.MaxKeep != nil && *cfg.Session.MaxKeep == 0 && resolved.Session.MaxKeep != defaultsResolved.Session.MaxKeep {
+		findings = append(findings, Finding{
+			Severity: SeverityWarn,
+			Path:     path,
+			Key:      "session.max_keep",
+			Message:  fmt.Sprintf("explicit zero for session.max_keep (resolved to %d); the default is %d. Zero disables session retention.", resolved.Session.MaxKeep, defaultsResolved.Session.MaxKeep),
+		})
+	}
+
+	return findings
+}
+
+// validPermissionMode returns true if s is a recognized
+// permission mode string. Kept as a local function instead of
+// importing permission.Mode.Valid() so doctor stays
+// independent of the permission package's Type system
+// (permission.Mode is a typed string with .Valid() but using
+// it would create a coupling we'd rather avoid here).
+func validPermissionMode(s string) bool {
+	switch s {
+	case "default", "accept_edits", "bypass", "deny", "plan", "auto":
+		return true
+	}
+	return false
+}
+
+// validRouterPrefer returns true if s is a recognized router
+// preference. Mirrors the policy table in router.ParsePreferPolicy
+// without importing that package (the parser lives in
+// internal/router; doctor is in internal/config and the
+// layering would invite import cycles if a future router
+// subpackage ever imports config).
+func validRouterPrefer(s string) bool {
+	switch s {
+	case "auto", "local", "cloud":
+		return true
+	}
+	return false
+}
+
+// validSLMBackend returns true if s is a recognized SLM
+// backend name. Mirrors the constants in internal/slm
+// (auto / ollama / llamacpp / llamafile / openaicompat /
+// disabled) without importing that package.
+func validSLMBackend(s string) bool {
+	switch s {
+	case "auto", "ollama", "llamacpp", "llamafile", "openaicompat", "disabled":
+		return true
+	}
+	return false
+}
@@ -0,0 +1,409 @@
+package config
+
+import (
+	"os"
+	"path/filepath"
+	"strings"
+	"testing"
+)
+
+// TestDiagnose_ValidFileNoFindings sanity-checks the no-op path:
+// a freshly-written config (after upgrade-config) produces zero
+// findings because every field either matches the default or
+// is a legitimate user value.
+func TestDiagnose_ValidFileNoFindings(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "config.toml")
+	if err := os.WriteFile(path, []byte("[provider]\ndefault = \"anthropic\"\n"), 0o644); err != nil {
+		t.Fatalf("seed: %v", err)
+	}
+
+	doc := NewDoctor()
+	fs := doc.DiagnoseFile(path)
+	for _, f := range fs {
+		if f.Severity >= SeverityWarn {
+			t.Errorf("unexpected warn/error finding for valid file: %+v", f)
+		}
+	}
+}
+
+// TestDiagnose_MissingFileReturnsErrorFinding verifies the
+// error path: a path that doesn't exist produces a single
+// SeverityError finding.
+func TestDiagnose_MissingFileReturnsErrorFinding(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "nonexistent.toml")
+
+	doc := NewDoctor()
+	fs := doc.DiagnoseFile(path)
+	if len(fs) != 1 {
+		t.Fatalf("len(findings) = %d, want 1", len(fs))
+	}
+	if fs[0].Severity != SeverityError {
+		t.Errorf("Severity = %v, want SeverityError", fs[0].Severity)
+	}
+	if !strings.Contains(fs[0].Message, "read:") {
+		t.Errorf("Message = %q, want it to mention the read error", fs[0].Message)
+	}
+}
+
+// TestDiagnose_CorruptFileReturnsErrorFinding verifies the
+// parse-error path: a file with invalid TOML produces a
+// SeverityError finding with a parse message.
+func TestDiagnose_CorruptFileReturnsErrorFinding(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "config.toml")
+	if err := os.WriteFile(path, []byte("[broken\nthis = 'is not valid"), 0o644); err != nil {
+		t.Fatalf("seed: %v", err)
+	}
+
+	doc := NewDoctor()
+	fs := doc.DiagnoseFile(path)
+	if len(fs) != 1 {
+		t.Fatalf("len(findings) = %d, want 1", len(fs))
+	}
+	if fs[0].Severity != SeverityError {
+		t.Errorf("Severity = %v, want SeverityError", fs[0].Severity)
+	}
+	if !strings.Contains(fs[0].Message, "parse:") {
+		t.Errorf("Message = %q, want it to mention the parse error", fs[0].Message)
+	}
+}
+
+// TestDiagnose_UnknownTopLevelKeysAreWarned verifies that keys
+// in the source file that don't map to any Config field
+// surface as SeverityWarn findings. Decoder ignores them
+// silently today; doctor surfaces them.
+func TestDiagnose_UnknownTopLevelKeysAreWarned(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "config.toml")
+	if err := os.WriteFile(path, []byte("[unknown_section]\nfoo = 1\n"), 0o644); err != nil {
+		t.Fatalf("seed: %v", err)
+	}
+
+	doc := NewDoctor()
+	fs := doc.DiagnoseFile(path)
+	found := false
+	for _, f := range fs {
+		if f.Severity == SeverityWarn && strings.Contains(f.Key, "unknown_section") {
+			found = true
+			break
+		}
+	}
+	if !found {
+		t.Errorf("expected warning for unknown_section, got %+v", fs)
+	}
+}
+
+// TestDiagnose_InvalidPermissionModeIsWarned verifies that an
+// invalid permission.mode value surfaces as SeverityWarn.
+// The mode is a string that must be one of the documented
+// permission.Mode constants.
+func TestDiagnose_InvalidPermissionModeIsWarned(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "config.toml")
+	if err := os.WriteFile(path, []byte("[permission]\nmode = \"yes\"\n"), 0o644); err != nil {
+		t.Fatalf("seed: %v", err)
+	}
+
+	doc := NewDoctor()
+	fs := doc.DiagnoseFile(path)
+	found := false
+	for _, f := range fs {
+		if f.Severity == SeverityWarn && f.Key == "permission.mode" {
+			found = true
+			if !strings.Contains(f.Message, "yes") {
+				t.Errorf("Message = %q, want it to mention the invalid value 'yes'", f.Message)
+			}
+		}
+	}
+	if !found {
+		t.Errorf("expected warning for invalid permission.mode, got %+v", fs)
+	}
+}
+
+// TestDiagnose_ValidPermissionModeIsClean verifies the
+// "explicit-valid" path: a user-set valid mode produces no
+// finding for permission.mode.
+func TestDiagnose_ValidPermissionModeIsClean(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "config.toml")
+	if err := os.WriteFile(path, []byte("[permission]\nmode = \"deny\"\n"), 0o644); err != nil {
+		t.Fatalf("seed: %v", err)
+	}
+
+	doc := NewDoctor()
+	fs := doc.DiagnoseFile(path)
+	for _, f := range fs {
+		if f.Key == "permission.mode" {
+			t.Errorf("unexpected finding for valid mode: %+v", f)
+		}
+	}
+}
+
+// TestDiagnose_InvalidRouterPreferIsWarned verifies that an
+// invalid router.prefer value surfaces as SeverityWarn.
+func TestDiagnose_InvalidRouterPreferIsWarned(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "config.toml")
+	if err := os.WriteFile(path, []byte("[router]\nprefer = \"yes\"\n"), 0o644); err != nil {
+		t.Fatalf("seed: %v", err)
+	}
+
+	doc := NewDoctor()
+	fs := doc.DiagnoseFile(path)
+	found := false
+	for _, f := range fs {
+		if f.Severity == SeverityWarn && f.Key == "router.prefer" {
+			found = true
+		}
+	}
+	if !found {
+		t.Errorf("expected warning for invalid router.prefer, got %+v", fs)
+	}
+}
+
+// TestDiagnose_ExplicitZeroProviderMaxTokensIsWarned verifies
+// the "explicit zero" case the upgrade-config preserves but
+// the doctor surfaces: a user-set *int64(0) on a pointer
+// field whose default is non-zero is probably a mistake.
+// SeverityWarn (not Error) because the user might have set
+// it intentionally.
+func TestDiagnose_ExplicitZeroProviderMaxTokensIsWarned(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "config.toml")
+	if err := os.WriteFile(path, []byte("[provider]\nmax_tokens = 0\n"), 0o644); err != nil {
+		t.Fatalf("seed: %v", err)
+	}
+
+	doc := NewDoctor()
+	fs := doc.DiagnoseFile(path)
+	found := false
+	for _, f := range fs {
+		if f.Severity == SeverityWarn && f.Key == "provider.max_tokens" {
+			found = true
+		}
+	}
+	if !found {
+		t.Errorf("expected warning for explicit-zero max_tokens, got %+v", fs)
+	}
+}
+
+// TestDiagnose_DefaultProviderMaxTokensClean documents the
+// "user set to default" case: the cleaner drops these, and
+// the doctor should NOT warn about them (the user did the
+// right thing by setting an explicit value that matches the
+// default).
+func TestDiagnose_DefaultProviderMaxTokensClean(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "config.toml")
+	if err := os.WriteFile(path, []byte("[provider]\nmax_tokens = 8192\n"), 0o644); err != nil {
+		t.Fatalf("seed: %v", err)
+	}
+
+	doc := NewDoctor()
+	fs := doc.DiagnoseFile(path)
+	for _, f := range fs {
+		if f.Key == "provider.max_tokens" {
+			t.Errorf("unexpected finding for default-equivalent max_tokens: %+v", f)
+		}
+	}
+}
+
+// TestDiagnose_DiagnoseManyAggregates verifies the multi-file
+// API: paths is a list of files to scan, the result is the
+// concatenation of per-file findings.
+func TestDiagnose_DiagnoseManyAggregates(t *testing.T) {
+	dir := t.TempDir()
+	good := filepath.Join(dir, "good.toml")
+	bad := filepath.Join(dir, "bad.toml")
+	_ = os.WriteFile(good, []byte("[provider]\ndefault = \"anthropic\"\n"), 0o644)
+	_ = os.WriteFile(bad, []byte("[permission]\nmode = \"yes\"\n"), 0o644)
+
+	doc := NewDoctor()
+	fs := doc.DiagnoseFiles([]string{good, bad})
+	if len(fs) < 1 {
+		t.Fatalf("len(findings) = %d, want >= 1", len(fs))
+	}
+	// The bad file should contribute at least one finding.
+	foundBad := false
+	for _, f := range fs {
+		if f.Path == bad {
+			foundBad = true
+		}
+	}
+	if !foundBad {
+		t.Errorf("expected finding for %s, got %+v", bad, fs)
+	}
+}
+
+// TestSeverity_String verifies the human-readable form of
+// Severity values for the CLI's text output.
+func TestSeverity_String(t *testing.T) {
+	cases := []struct {
+		sev  Severity
+		want string
+	}{
+		{SeverityInfo, "info"},
+		{SeverityWarn, "warn"},
+		{SeverityError, "error"},
+	}
+	for _, c := range cases {
+		if got := c.sev.String(); got != c.want {
+			t.Errorf("Severity(%d).String() = %q, want %q", c.sev, got, c.want)
+		}
+	}
+}
+
+// TestDiagnoseLayering_ProjectShadowsGlobal_PreferEmpty verifies
+// the original 2026-05-24 silent-corruption bug: the project
+// file has `router.prefer = ""` which shadows the global's
+// `router.prefer = "cloud"`. Doctor must surface this.
+func TestDiagnoseLayering_ProjectShadowsGlobal_PreferEmpty(t *testing.T) {
+	dir := t.TempDir()
+	global := filepath.Join(dir, "global.toml")
+	project := filepath.Join(dir, "project.toml")
+
+	_ = os.WriteFile(global, []byte("[router]\nprefer = \"cloud\"\n"), 0o644)
+	_ = os.WriteFile(project, []byte("[router]\nprefer = \"\"\n"), 0o644)
+
+	doc := NewDoctor()
+	fs := doc.DiagnoseLayering(global, project)
+	found := false
+	for _, f := range fs {
+		if f.Key == "router.prefer" && f.Severity == SeverityWarn {
+			found = true
+			if !strings.Contains(f.Message, "shadow") {
+				t.Errorf("Message = %q, want it to mention shadowing", f.Message)
+			}
+		}
+	}
+	if !found {
+		t.Errorf("expected shadowing warning for router.prefer, got %+v", fs)
+	}
+}
+
+// TestDiagnoseLayering_NoShadowWhenValuesMatch verifies the
+// happy path: when the project's resolved value matches the
+// global's, no shadowing finding is emitted.
+func TestDiagnoseLayering_NoShadowWhenValuesMatch(t *testing.T) {
+	dir := t.TempDir()
+	global := filepath.Join(dir, "global.toml")
+	project := filepath.Join(dir, "project.toml")
+
+	_ = os.WriteFile(global, []byte("[router]\nprefer = \"cloud\"\n"), 0o644)
+	_ = os.WriteFile(project, []byte("[router]\nprefer = \"local\"\n"), 0o644)
+
+	doc := NewDoctor()
+	fs := doc.DiagnoseLayering(global, project)
+	for _, f := range fs {
+		if f.Key == "router.prefer" {
+			t.Errorf("unexpected finding when project overrides global intentionally: %+v", f)
+		}
+	}
+}
+
+// TestDiagnoseLayering_NoShadowWhenProjectInheritsDefault
+// documents the inheritance path: when the project's field
+// is absent (resolves to the default), it inherits the
+// global's value (or the default if global is also default).
+// Neither case is shadowing.
+func TestDiagnoseLayering_NoShadowWhenProjectInheritsDefault(t *testing.T) {
+	dir := t.TempDir()
+	global := filepath.Join(dir, "global.toml")
+	project := filepath.Join(dir, "project.toml")
+
+	// Global has a non-default value, project has no router
+	// section at all. The project inherits the global's "cloud"
+	// — no shadowing.
+	_ = os.WriteFile(global, []byte("[router]\nprefer = \"cloud\"\n"), 0o644)
+	_ = os.WriteFile(project, []byte("[provider]\ndefault = \"anthropic\"\n"), 0o644)
+
+	doc := NewDoctor()
+	fs := doc.DiagnoseLayering(global, project)
+	for _, f := range fs {
+		if f.Key == "router.prefer" {
+			t.Errorf("unexpected shadowing finding when project has no [router] section: %+v", f)
+		}
+	}
+}
+
+// TestDiagnoseLayering_ProjectShadowsGlobal_PermissionMode
+// verifies another common shadowing case: project has
+// `permission.mode = ""` while global has `permission.mode =
+// "deny"`. The merged value is "" (default "auto"), silently
+// overriding the user's intent.
+func TestDiagnoseLayering_ProjectShadowsGlobal_PermissionMode(t *testing.T) {
+	dir := t.TempDir()
+	global := filepath.Join(dir, "global.toml")
+	project := filepath.Join(dir, "project.toml")
+
+	_ = os.WriteFile(global, []byte("[permission]\nmode = \"deny\"\n"), 0o644)
+	_ = os.WriteFile(project, []byte("[permission]\nmode = \"\"\n"), 0o644)
+
+	doc := NewDoctor()
+	fs := doc.DiagnoseLayering(global, project)
+	found := false
+	for _, f := range fs {
+		if f.Key == "permission.mode" && f.Severity == SeverityWarn {
+			found = true
+		}
+	}
+	if !found {
+		t.Errorf("expected shadowing warning for permission.mode, got %+v", fs)
+	}
+}
+
+// TestDiagnoseLayering_ProjectShadowsGlobal_ProviderDefault
+// documents the provider.default shadowing case: project has
+// empty default, global has a real one. The user's "openai"
+// at the global level is silently overridden.
+func TestDiagnoseLayering_ProjectShadowsGlobal_ProviderDefault(t *testing.T) {
+	dir := t.TempDir()
+	global := filepath.Join(dir, "global.toml")
+	project := filepath.Join(dir, "project.toml")
+
+	_ = os.WriteFile(global, []byte("[provider]\ndefault = \"anthropic\"\n"), 0o644)
+	_ = os.WriteFile(project, []byte("[provider]\ndefault = \"\"\n"), 0o644)
+
+	doc := NewDoctor()
+	fs := doc.DiagnoseLayering(global, project)
+	found := false
+	for _, f := range fs {
+		if f.Key == "provider.default" && f.Severity == SeverityWarn {
+			found = true
+		}
+	}
+	if !found {
+		t.Errorf("expected shadowing warning for provider.default, got %+v", fs)
+	}
+}
+
+// TestDiagnoseLayering_MissingGlobalIsNoOp documents the
+// "no global config" case: doctor cannot run a layering
+// check without a global baseline, so it returns no findings.
+func TestDiagnoseLayering_MissingGlobalIsNoOp(t *testing.T) {
+	dir := t.TempDir()
+	project := filepath.Join(dir, "project.toml")
+	_ = os.WriteFile(project, []byte("[router]\nprefer = \"\"\n"), 0o644)
+
+	doc := NewDoctor()
+	fs := doc.DiagnoseLayering(filepath.Join(dir, "nonexistent-global.toml"), project)
+	if len(fs) != 0 {
+		t.Errorf("expected no findings when global is missing, got %+v", fs)
+	}
+}
+
+// TestDiagnoseLayering_MissingProjectIsNoOp mirrors the above:
+// without a project file there's nothing to shadow.
+func TestDiagnoseLayering_MissingProjectIsNoOp(t *testing.T) {
+	dir := t.TempDir()
+	global := filepath.Join(dir, "global.toml")
+	_ = os.WriteFile(global, []byte("[router]\nprefer = \"cloud\"\n"), 0o644)
+
+	doc := NewDoctor()
+	fs := doc.DiagnoseLayering(global, filepath.Join(dir, "nonexistent-project.toml"))
+	if len(fs) != 0 {
+		t.Errorf("expected no findings when project is missing, got %+v", fs)
+	}
+}
@@ -92,9 +92,26 @@ func ProjectRoot() string {
 }

 func projectConfigPath() string {
+	return ProjectConfigPath()
+}
+
+// ProjectConfigPath returns the path to the project config file
+// for the current working directory (.gnoma/config.toml under
+// the project root). Exported so the `gnoma upgrade-config` CLI
+// (and any future callers that need to point at the project
+// config) can use it.
+func ProjectConfigPath() string {
 	return filepath.Join(ProjectRoot(), ".gnoma", "config.toml")
 }

+// ProjectConfigPathFor returns the project config path for an
+// arbitrary project root. Used by `gnoma doctor --all-projects`
+// to enumerate registry entries without `chdir`-ing into each
+// project.
+func ProjectConfigPathFor(projectRoot string) string {
+	return filepath.Join(projectRoot, ".gnoma", "config.toml")
+}
+
 func applyEnv(cfg *Config) {
 	envKeys := map[string]string{
 		"mistral":   "MISTRAL_API_KEY",
@@ -218,8 +218,8 @@ claude = "claude-work"
 	if cfg.Provider.Model != "claude-base" {
 		t.Errorf("Model = %q, want claude-base (base preserved)", cfg.Provider.Model)
 	}
-	if cfg.Provider.MaxTokens != 4096 {
-		t.Errorf("MaxTokens = %d, want 4096 (base preserved)", cfg.Provider.MaxTokens)
+	if cfg.Provider.MaxTokens == nil || *cfg.Provider.MaxTokens != 4096 {
+		t.Errorf("MaxTokens = %v, want *4096 (base preserved)", cfg.Provider.MaxTokens)
 	}
 	// Map per-key merge.
 	if cfg.Provider.APIKeys["anthropic"] != "BASE_A" {
@@ -0,0 +1,152 @@
+package config
+
+import (
+	"encoding/json"
+	"errors"
+	"fmt"
+	"os"
+	"path/filepath"
+	"sort"
+	"sync"
+	"time"
+)
+
+// ProjectEntry is one row in the project registry. The registry
+// is purely local — written to ~/.config/gnoma/projects.json and
+// never sent off-machine. The shape is stable for the v0.4.x
+// series; the schema-version key is reserved for future
+// migrations.
+type ProjectEntry struct {
+	Path         string    `json:"path"`
+	FirstSeen    time.Time `json:"first_seen"`
+	LastSeen     time.Time `json:"last_seen"`
+	SessionCount int       `json:"session_count"`
+}
+
+// Registry is the on-disk list of projects gnoma has been
+// launched in. Used by:
+//   - `gnoma doctor --all-projects` (Phase 3)
+//   - `gnoma upgrade-config --all` (Phase 4 --all-projects)
+//   - `gnoma sessions --all` picker (cross-project resume)
+//   - `gnoma stats` (local-only aggregate metrics)
+//
+// Loaded once at startup, mutated in-process, saved atomically.
+// The struct is safe for concurrent Record/Prune calls (each
+// call locks the mutex), but in the typical flow only one
+// goroutine (main) writes to it.
+type Registry struct {
+	path string `json:"-"` // unexported, not serialized
+
+	mu       sync.Mutex
+	Projects []ProjectEntry `json:"projects"`
+}
+
+// RegistryFilePath returns the canonical path to the registry
+// file (~/.config/gnoma/projects.json). Exported so callers
+// (and tests) can inspect / delete the file.
+func RegistryFilePath() string {
+	return filepath.Join(GlobalConfigDir(), "projects.json")
+}
+
+// LoadRegistry reads the registry from the canonical path
+// (~/.config/gnoma/projects.json). A missing file is not an
+// error: returns an empty Registry. A corrupt file is an error
+// — silent zero-ing on corruption would let a broken file
+// accumulate stale state indefinitely.
+func LoadRegistry() (*Registry, error) {
+	return LoadRegistryAt(RegistryFilePath())
+}
+
+// LoadRegistryAt is the testable variant: load the registry
+// from an explicit path instead of the canonical one. Used by
+// the test suite to keep `~/.config/gnoma/projects.json`
+// untouched.
+func LoadRegistryAt(path string) (*Registry, error) {
+	r := &Registry{path: path}
+	data, err := os.ReadFile(path)
+	if err != nil {
+		if os.IsNotExist(err) {
+			return r, nil
+		}
+		return nil, fmt.Errorf("read registry: %w", err)
+	}
+	if err := json.Unmarshal(data, r); err != nil {
+		return nil, fmt.Errorf("parse registry: %w", err)
+	}
+	return r, nil
+}
+
+// Record adds or updates the entry for projectRoot. Bumps
+// LastSeen and SessionCount for an existing entry; appends a
+// fresh row for a new path. Saves atomically.
+//
+// Empty projectRoot is an error — ProgrammerError to call
+// with "". Path normalization (e.g. resolving symlinks) is
+// the caller's responsibility; ProjectRoot() in load.go
+// already returns an absolute path so the typical caller
+// doesn't need to think about it.
+func (r *Registry) Record(projectRoot string) error {
+	if projectRoot == "" {
+		return errors.New("project root is empty")
+	}
+	r.mu.Lock()
+	defer r.mu.Unlock()
+
+	now := time.Now().UTC()
+	for i := range r.Projects {
+		if r.Projects[i].Path == projectRoot {
+			r.Projects[i].LastSeen = now
+			r.Projects[i].SessionCount++
+			return r.saveLocked()
+		}
+	}
+	r.Projects = append(r.Projects, ProjectEntry{
+		Path:         projectRoot,
+		FirstSeen:    now,
+		LastSeen:     now,
+		SessionCount: 1,
+	})
+	return r.saveLocked()
+}
+
+// Prune removes entries with LastSeen older than staleBefore.
+// Returns the (sorted) list of pruned paths so callers can
+// surface them in user-facing output (e.g. `gnoma doctor`).
+// No-op when nothing is stale.
+func (r *Registry) Prune(staleBefore time.Duration) ([]string, error) {
+	r.mu.Lock()
+	defer r.mu.Unlock()
+
+	cutoff := time.Now().UTC().Add(-staleBefore)
+	var pruned []string
+	var kept []ProjectEntry
+	for _, p := range r.Projects {
+		if p.LastSeen.Before(cutoff) {
+			pruned = append(pruned, p.Path)
+		} else {
+			kept = append(kept, p)
+		}
+	}
+	if len(pruned) == 0 {
+		return nil, nil
+	}
+	sort.Strings(pruned)
+	r.Projects = kept
+	if err := r.saveLocked(); err != nil {
+		return pruned, err
+	}
+	return pruned, nil
+}
+
+// saveLocked writes the registry to disk atomically. The
+// caller must hold r.mu.
+func (r *Registry) saveLocked() error {
+	if err := os.MkdirAll(filepath.Dir(r.path), 0o755); err != nil {
+		return fmt.Errorf("create registry dir: %w", err)
+	}
+	data, err := json.MarshalIndent(r, "", "  ")
+	if err != nil {
+		return fmt.Errorf("marshal registry: %w", err)
+	}
+	return writeAtomicBytes(r.path, data)
+}
@@ -0,0 +1,357 @@
+package config
+
+import (
+	"encoding/json"
+	"os"
+	"path/filepath"
+	"sort"
+	"strings"
+	"testing"
+	"time"
+)
+
+// TestRegistry_LoadAt_MissingFileReturnsEmpty verifies the
+// "no file yet" path: LoadRegistryAt returns a fresh, empty
+// registry with no error, so first-run users don't see a
+// "no such file" error.
+func TestRegistry_LoadAt_MissingFileReturnsEmpty(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "projects.json")
+
+	reg, err := LoadRegistryAt(path)
+	if err != nil {
+		t.Fatalf("LoadRegistryAt: %v", err)
+	}
+	if reg == nil {
+		t.Fatal("LoadRegistryAt returned nil registry")
+	}
+	if len(reg.Projects) != 0 {
+		t.Errorf("len(Projects) = %d, want 0", len(reg.Projects))
+	}
+}
+
+// TestRegistry_LoadAt_ValidFileParses verifies the load path
+// against a known-good file written by a previous save.
+func TestRegistry_LoadAt_ValidFileParses(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "projects.json")
+
+	seed := Registry{
+		Projects: []ProjectEntry{
+			{
+				Path:         "/home/user/git/foo",
+				FirstSeen:    time.Date(2026, 4, 15, 10, 30, 0, 0, time.UTC),
+				LastSeen:     time.Date(2026, 5, 24, 19, 23, 0, 0, time.UTC),
+				SessionCount: 47,
+			},
+		},
+	}
+	data, _ := json.MarshalIndent(&seed, "", "  ")
+	if err := os.WriteFile(path, data, 0o644); err != nil {
+		t.Fatalf("seed: %v", err)
+	}
+
+	reg, err := LoadRegistryAt(path)
+	if err != nil {
+		t.Fatalf("LoadRegistryAt: %v", err)
+	}
+	if len(reg.Projects) != 1 {
+		t.Fatalf("len(Projects) = %d, want 1", len(reg.Projects))
+	}
+	got := reg.Projects[0]
+	if got.Path != "/home/user/git/foo" {
+		t.Errorf("Path = %q, want /home/user/git/foo", got.Path)
+	}
+	if got.SessionCount != 47 {
+		t.Errorf("SessionCount = %d, want 47", got.SessionCount)
+	}
+}
+
+// TestRegistry_LoadAt_CorruptFileErrors verifies that a malformed
+// JSON file produces an error, not a silent zero-valued registry.
+// Silent zero-ing would let file corruption go unnoticed.
+func TestRegistry_LoadAt_CorruptFileErrors(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "projects.json")
+	if err := os.WriteFile(path, []byte("{ this is not valid json"), 0o644); err != nil {
+		t.Fatalf("seed: %v", err)
+	}
+
+	_, err := LoadRegistryAt(path)
+	if err == nil {
+		t.Fatal("LoadRegistryAt on corrupt file returned nil error")
+	}
+}
+
+// TestRegistry_Record_AddsNewProject verifies the first-record
+// path: a new path gets a fresh entry with FirstSeen == LastSeen
+// and SessionCount == 1.
+func TestRegistry_Record_AddsNewProject(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "projects.json")
+
+	reg, _ := LoadRegistryAt(path)
+	if err := reg.Record("/home/user/git/foo"); err != nil {
+		t.Fatalf("Record: %v", err)
+	}
+	if len(reg.Projects) != 1 {
+		t.Fatalf("len(Projects) = %d, want 1", len(reg.Projects))
+	}
+	p := reg.Projects[0]
+	if p.Path != "/home/user/git/foo" {
+		t.Errorf("Path = %q, want /home/user/git/foo", p.Path)
+	}
+	if !p.FirstSeen.Equal(p.LastSeen) {
+		t.Errorf("FirstSeen=%v != LastSeen=%v (should be equal on first record)", p.FirstSeen, p.LastSeen)
+	}
+	if p.SessionCount != 1 {
+		t.Errorf("SessionCount = %d, want 1", p.SessionCount)
+	}
+}
+
+// TestRegistry_Record_BumpsExistingProject verifies the
+// second-record path: a project that's already in the registry
+// gets LastSeen updated and SessionCount incremented; FirstSeen
+// is preserved.
+func TestRegistry_Record_BumpsExistingProject(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "projects.json")
+
+	reg, _ := LoadRegistryAt(path)
+	if err := reg.Record("/home/user/git/foo"); err != nil {
+		t.Fatalf("first Record: %v", err)
+	}
+	firstSeen := reg.Projects[0].FirstSeen
+
+	// Wait long enough that time.Now() will differ at nanosecond
+	// resolution. time.Time comparison uses nanoseconds; the
+	// millisecond between two Record calls is plenty.
+	time.Sleep(2 * time.Millisecond)
+	if err := reg.Record("/home/user/git/foo"); err != nil {
+		t.Fatalf("second Record: %v", err)
+	}
+	if len(reg.Projects) != 1 {
+		t.Fatalf("len(Projects) = %d, want 1 (no duplicate)", len(reg.Projects))
+	}
+	p := reg.Projects[0]
+	if p.SessionCount != 2 {
+		t.Errorf("SessionCount = %d, want 2", p.SessionCount)
+	}
+	if !p.FirstSeen.Equal(firstSeen) {
+		t.Errorf("FirstSeen changed: %v → %v", firstSeen, p.FirstSeen)
+	}
+	if !p.LastSeen.After(firstSeen) {
+		t.Errorf("LastSeen=%v not after FirstSeen=%v", p.LastSeen, firstSeen)
+	}
+}
+
+// TestRegistry_Record_EmptyPathReturnsError verifies the
+// input-validation path. An empty project root is a programmer
+// error, not a silent no-op.
+func TestRegistry_Record_EmptyPathReturnsError(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "projects.json")
+	reg, _ := LoadRegistryAt(path)
+
+	if err := reg.Record(""); err == nil {
+		t.Error("Record(\"\") returned nil error, want error")
+	}
+}
+
+// TestRegistry_Record_AtomicWriteLeavesNoTemp verifies the
+// atomic-write hygiene: after a successful Record, no .tmp-*
+// file is left in the directory.
+func TestRegistry_Record_AtomicWriteLeavesNoTemp(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "projects.json")
+
+	reg, _ := LoadRegistryAt(path)
+	if err := reg.Record("/home/user/git/foo"); err != nil {
+		t.Fatalf("Record: %v", err)
+	}
+
+	entries, err := os.ReadDir(dir)
+	if err != nil {
+		t.Fatalf("ReadDir: %v", err)
+	}
+	for _, e := range entries {
+		if e.Name() != "projects.json" {
+			t.Errorf("unexpected leftover file: %q", e.Name())
+		}
+	}
+}
+
+// TestRegistry_Record_PersistsAcrossReload verifies the
+// save/load contract: a Record followed by a fresh Load
+// returns the updated data.
+func TestRegistry_Record_PersistsAcrossReload(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "projects.json")
+
+	reg, _ := LoadRegistryAt(path)
+	if err := reg.Record("/home/user/git/foo"); err != nil {
+		t.Fatalf("Record: %v", err)
+	}
+	if err := reg.Record("/home/user/git/bar"); err != nil {
+		t.Fatalf("Record: %v", err)
+	}
+
+	// Fresh load (simulates a new process).
+	reloaded, err := LoadRegistryAt(path)
+	if err != nil {
+		t.Fatalf("re-Load: %v", err)
+	}
+	if len(reloaded.Projects) != 2 {
+		t.Errorf("len(Projects) = %d, want 2", len(reloaded.Projects))
+	}
+	// Order is not guaranteed; check both paths present.
+	paths := []string{reloaded.Projects[0].Path, reloaded.Projects[1].Path}
+	sort.Strings(paths)
+	want := []string{"/home/user/git/bar", "/home/user/git/foo"}
+	for i, p := range want {
+		if paths[i] != p {
+			t.Errorf("paths[%d] = %q, want %q", i, paths[i], p)
+		}
+	}
+}
+
+// TestRegistry_Save_CreatatesDirectoryIfMissing verifies the
+// "first save" path: the registry file lives in a directory
+// that may not exist yet. Save should create the directory
+// rather than fail.
+func TestRegistry_Save_CreatatesDirectoryIfMissing(t *testing.T) {
+	dir := t.TempDir()
+	deepPath := filepath.Join(dir, "nested", "deeper", "projects.json")
+
+	reg, _ := LoadRegistryAt(deepPath)
+	if err := reg.Record("/home/user/git/foo"); err != nil {
+		t.Fatalf("Record: %v", err)
+	}
+	if _, err := os.Stat(deepPath); err != nil {
+		t.Errorf("expected file at %s, got %v", deepPath, err)
+	}
+}
+
+// TestRegistry_Prune_RemovesStaleEntries verifies the core
+// pruning semantic: entries with LastSeen older than the
+// cutoff are removed; the rest are kept.
+func TestRegistry_Prune_RemovesStaleEntries(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "projects.json")
+
+	now := time.Now().UTC()
+	reg := &Registry{path: path, Projects: []ProjectEntry{
+		{Path: "/stale/1", FirstSeen: now.Add(-100 * 24 * time.Hour), LastSeen: now.Add(-90 * 24 * time.Hour), SessionCount: 5},
+		{Path: "/fresh/1", FirstSeen: now.Add(-1 * 24 * time.Hour), LastSeen: now.Add(-1 * time.Hour), SessionCount: 10},
+		{Path: "/stale/2", FirstSeen: now.Add(-200 * 24 * time.Hour), LastSeen: now.Add(-60 * 24 * time.Hour), SessionCount: 1},
+		{Path: "/fresh/2", FirstSeen: now, LastSeen: now, SessionCount: 1},
+	}}
+
+	pruned, err := reg.Prune(30 * 24 * time.Hour) // 30 days
+	if err != nil {
+		t.Fatalf("Prune: %v", err)
+	}
+	if len(pruned) != 2 {
+		t.Errorf("len(pruned) = %d, want 2 (got %v)", len(pruned), pruned)
+	}
+	if len(reg.Projects) != 2 {
+		t.Errorf("len(Projects) = %d, want 2", len(reg.Projects))
+	}
+	for _, p := range reg.Projects {
+		if !strings.HasPrefix(p.Path, "/fresh/") {
+			t.Errorf("stale project %q survived prune", p.Path)
+		}
+	}
+}
+
+// TestRegistry_Prune_KeepsRecentEntries documents the inverse
+// case: nothing to prune returns an empty list and no save.
+func TestRegistry_Prune_KeepsRecentEntries(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "projects.json")
+
+	now := time.Now().UTC()
+	reg := &Registry{path: path, Projects: []ProjectEntry{
+		{Path: "/fresh/1", FirstSeen: now, LastSeen: now, SessionCount: 1},
+		{Path: "/fresh/2", FirstSeen: now, LastSeen: now.Add(-1 * time.Hour), SessionCount: 2},
+	}}
+
+	pruned, err := reg.Prune(30 * 24 * time.Hour)
+	if err != nil {
+		t.Fatalf("Prune: %v", err)
+	}
+	if len(pruned) != 0 {
+		t.Errorf("len(pruned) = %d, want 0 (got %v)", len(pruned), pruned)
+	}
+	if len(reg.Projects) != 2 {
+		t.Errorf("len(Projects) = %d, want 2", len(reg.Projects))
+	}
+}
+
+// TestRegistry_Prune_ReportsPrunedPaths verifies the return
+// value: the pruned paths are returned to the caller for
+// reporting (e.g. `gnoma doctor` could surface this).
+func TestRegistry_Prune_ReportsPrunedPaths(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "projects.json")
+
+	now := time.Now().UTC()
+	reg := &Registry{path: path, Projects: []ProjectEntry{
+		{Path: "/z/last-stale", FirstSeen: now.Add(-100 * 24 * time.Hour), LastSeen: now.Add(-90 * 24 * time.Hour)},
+		{Path: "/a/first-stale", FirstSeen: now.Add(-200 * 24 * time.Hour), LastSeen: now.Add(-60 * 24 * time.Hour)},
+	}}
+
+	pruned, _ := reg.Prune(30 * 24 * time.Hour)
+	if len(pruned) != 2 {
+		t.Fatalf("len(pruned) = %d, want 2", len(pruned))
+	}
+	// Sorted for deterministic caller output.
+	if pruned[0] != "/a/first-stale" || pruned[1] != "/z/last-stale" {
+		t.Errorf("pruned = %v, want sorted [/a/first-stale /z/last-stale]", pruned)
+	}
+}
+
+// TestRegistry_Prune_EmptyRegistryIsNoOp verifies the
+// "nothing to prune" edge case on an empty registry.
+func TestRegistry_Prune_EmptyRegistryIsNoOp(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "projects.json")
+	reg := &Registry{path: path}
+
+	pruned, err := reg.Prune(30 * 24 * time.Hour)
+	if err != nil {
+		t.Fatalf("Prune: %v", err)
+	}
+	if len(pruned) != 0 {
+		t.Errorf("len(pruned) = %d, want 0", len(pruned))
+	}
+}
+
+// TestRegistry_Prune_PersistsAcrossReload verifies that the
+// pruned state is written to disk and visible after a fresh
+// LoadRegistryAt. The save happens inside Prune; the reload
+// confirms it.
+func TestRegistry_Prune_PersistsAcrossReload(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "projects.json")
+
+	now := time.Now().UTC()
+	reg := &Registry{path: path, Projects: []ProjectEntry{
+		{Path: "/stale", FirstSeen: now.Add(-100 * 24 * time.Hour), LastSeen: now.Add(-90 * 24 * time.Hour)},
+		{Path: "/fresh", FirstSeen: now, LastSeen: now},
+	}}
+	if _, err := reg.Prune(30 * 24 * time.Hour); err != nil {
+		t.Fatalf("Prune: %v", err)
+	}
+
+	reloaded, err := LoadRegistryAt(path)
+	if err != nil {
+		t.Fatalf("re-Load: %v", err)
+	}
+	if len(reloaded.Projects) != 1 {
+		t.Errorf("len(Projects) after reload = %d, want 1", len(reloaded.Projects))
+	}
+	if len(reloaded.Projects) == 1 && reloaded.Projects[0].Path != "/fresh" {
+		t.Errorf("reloaded project = %q, want /fresh", reloaded.Projects[0].Path)
+	}
+}
@@ -0,0 +1,223 @@
+package config
+
+import "time"
+
+// ResolvedConfig is the post-Load view of a Config: every pointer
+// field has been dereferenced with the default substituted for nil.
+// Consumers should read cfg.Resolved().X for the fields listed in
+// the resolver table; raw cfg.X remains valid for the string / map /
+// slice fields that kept their non-pointer types and are read at
+// their call site.
+//
+// This mirrors the ResolvedSafetySection pattern: a separate mirror
+// type whose construction is the boundary where "user omitted the
+// key" and "user set it to the zero value" stop being ambiguous.
+//
+// Fields that are not pointer-converted (string / map / slice /
+// BanditSection) are intentionally omitted from the mirror — call
+// sites read them directly from the source Config.
+type ResolvedConfig struct {
+	// ProjectRegistry mirrors Config.ProjectRegistry. nil →
+	// default (true, registry enabled); *false → registry
+	// disabled. Lives at the top level because it gates a
+	// gnoma-wide behavior (writing to projects.json), not a
+	// section's behavior.
+	ProjectRegistry bool
+
+	Provider ResolvedProviderSection
+	Tools    ResolvedToolsSection
+	Security ResolvedSecuritySection
+	Router   ResolvedRouterSection
+	Session  ResolvedSessionSection
+	SLM      ResolvedSLMSection
+	Hooks    []ResolvedHook
+}
+
+// ResolvedProviderSection is ProviderSection with all pointer
+// fields dereferenced.
+type ResolvedProviderSection struct {
+	Default     string
+	Model       string
+	MaxTokens   int64
+	Temperature *float64
+	APIKeys     map[string]string
+	Endpoints   map[string]string
+}
+
+// ResolvedToolsSection is ToolsSection with pointer fields
+// dereferenced. BashTimeout is left as a time.Duration so the
+// `Duration == 0` sentinel "use built-in default" can be checked
+// by consumers that care.
+type ResolvedToolsSection struct {
+	BashTimeout time.Duration
+	MaxFileSize int64
+}
+
+// ResolvedSecuritySection is SecuritySection with pointer fields
+// dereferenced.
+type ResolvedSecuritySection struct {
+	EntropyThreshold  float64
+	RedactHighEntropy bool
+	EntropySafelist   []string
+	Patterns          []PatternConfig
+}
+
+// ResolvedRouterSection is RouterSection with pointer fields
+// dereferenced. Bandit is omitted — its 0-sentinel pattern is
+// documented at the source struct and read directly via
+// cfg.Router.Bandit.
+type ResolvedRouterSection struct {
+	ForceTwoStage bool
+	Prefer        string
+}
+
+// ResolvedSessionSection is SessionSection with pointer fields
+// dereferenced.
+type ResolvedSessionSection struct {
+	MaxKeep int
+}
+
+// ResolvedSLMSection is SLMSection with pointer-converted fields
+// dereferenced. Added in the 2026-06-04 follow-up to Phase 1 of
+// the config-migration plan — see
+// docs/superpowers/plans/2026-06-04-config-migration-followups.md.
+// Enabled / RegisterAsArm stay as their Go types (not pointers:
+// the existing 0-sentinel pattern still applies for Enabled, and
+// RegisterAsArm was already *bool with its own nil→true handling
+// at the call sites — see internal/slm/arm.go).
+type ResolvedSLMSection struct {
+	Enabled         bool
+	Backend         string
+	Model           string
+	BaseURL         string
+	ModelURL        string
+	DataDir         string
+	ExpectedSHA256  string
+	StartupTimeout  time.Duration
+	ClassifyTimeout time.Duration
+	RegisterAsArm   bool
+}
+
+// ResolvedHook is HookConfig with FailOpen dereferenced. All other
+// fields are pass-through copies.
+type ResolvedHook struct {
+	Name        string
+	Event       string
+	Type        string
+	Exec        string
+	Timeout     string
+	FailOpen    bool
+	ToolPattern string
+}
+
+// Resolved builds a ResolvedConfig from a Config, substituting
+// Defaults() values for any nil pointer fields. Called once at the
+// end of LoadWithProfile (and LoadBase) so all consumer code reads
+// resolved values; raw layered structs are internal.
+func (c *Config) Resolved() *ResolvedConfig {
+	d := Defaults()
+
+	projectRegistry := true
+	if c.Settings.ProjectRegistry != nil {
+		projectRegistry = *c.Settings.ProjectRegistry
+	}
+
+	provider := ResolvedProviderSection{
+		Default:     c.Provider.Default,
+		Model:       c.Provider.Model,
+		MaxTokens:   *d.Provider.MaxTokens,
+		Temperature: c.Provider.Temperature,
+		APIKeys:     c.Provider.APIKeys,
+		Endpoints:   c.Provider.Endpoints,
+	}
+	if c.Provider.MaxTokens != nil {
+		provider.MaxTokens = *c.Provider.MaxTokens
+	}
+
+	tools := ResolvedToolsSection{
+		BashTimeout: d.Tools.BashTimeout.Duration(),
+		MaxFileSize: *d.Tools.MaxFileSize,
+	}
+	if c.Tools.BashTimeout != 0 {
+		tools.BashTimeout = c.Tools.BashTimeout.Duration()
+	}
+	if c.Tools.MaxFileSize != nil {
+		tools.MaxFileSize = *c.Tools.MaxFileSize
+	}
+
+	security := ResolvedSecuritySection{
+		EntropyThreshold:  *d.Security.EntropyThreshold,
+		RedactHighEntropy: *d.Security.RedactHighEntropy,
+		EntropySafelist:   c.Security.EntropySafelist,
+		Patterns:          c.Security.Patterns,
+	}
+	if c.Security.EntropyThreshold != nil {
+		security.EntropyThreshold = *c.Security.EntropyThreshold
+	}
+	if c.Security.RedactHighEntropy != nil {
+		security.RedactHighEntropy = *c.Security.RedactHighEntropy
+	}
+
+	router := ResolvedRouterSection{
+		ForceTwoStage: *d.Router.ForceTwoStage,
+		Prefer:        c.Router.Prefer,
+	}
+	if c.Router.ForceTwoStage != nil {
+		router.ForceTwoStage = *c.Router.ForceTwoStage
+	}
+
+	session := ResolvedSessionSection{
+		MaxKeep: *d.Session.MaxKeep,
+	}
+	if c.Session.MaxKeep != nil {
+		session.MaxKeep = *c.Session.MaxKeep
+	}
+
+	slm := ResolvedSLMSection{
+		Enabled:         c.SLM.Enabled,
+		Backend:         c.SLM.Backend,
+		Model:           c.SLM.Model,
+		BaseURL:         c.SLM.BaseURL,
+		ModelURL:        c.SLM.ModelURL,
+		DataDir:         c.SLM.DataDir,
+		ExpectedSHA256:  c.SLM.ExpectedSHA256,
+		StartupTimeout:  d.SLM.StartupTimeout.Duration(),
+		ClassifyTimeout: d.SLM.ClassifyTimeout.Duration(),
+		// RegisterAsArm: nil → default (true), explicit *true → true,
+		// explicit *false → false. The default-true case preserves
+		// pre-config behaviour where the SLM is always registered as
+		// an execution arm in addition to its classifier role.
+		RegisterAsArm: c.SLM.RegisterAsArm == nil || *c.SLM.RegisterAsArm,
+	}
+	if c.SLM.StartupTimeout != nil {
+		slm.StartupTimeout = c.SLM.StartupTimeout.Duration()
+	}
+	if c.SLM.ClassifyTimeout != nil {
+		slm.ClassifyTimeout = c.SLM.ClassifyTimeout.Duration()
+	}
+
+	hooks := make([]ResolvedHook, len(c.Hooks))
+	for i, h := range c.Hooks {
+		failOpen := h.FailOpen != nil && *h.FailOpen
+		hooks[i] = ResolvedHook{
+			Name:        h.Name,
+			Event:       h.Event,
+			Type:        h.Type,
+			Exec:        h.Exec,
+			Timeout:     h.Timeout,
+			FailOpen:    failOpen,
+			ToolPattern: h.ToolPattern,
+		}
+	}
+
+	return &ResolvedConfig{
+		ProjectRegistry: projectRegistry,
+		Provider:        provider,
+		Tools:           tools,
+		Security:        security,
+		Router:          router,
+		Session:         session,
+		SLM:             slm,
+		Hooks:           hooks,
+	}
+}
@@ -0,0 +1,274 @@
+package config
+
+import (
+	"testing"
+	"time"
+)
+
+// i64p returns a pointer to its argument. Test helper for
+// constructing literal `*int64` values without a temporary variable.
+func i64p(v int64) *int64 { return &v }
+
+// ip returns a pointer to its argument. Test helper for
+// constructing literal `*int` values.
+func ip(v int) *int { return &v }
+
+// bp returns a pointer to its argument. Test helper for
+// constructing literal `*bool` values.
+func bp(v bool) *bool { return &v }
+
+// fp64 returns a pointer to its argument. Test helper for
+// constructing literal `*float64` values.
+func fp64(v float64) *float64 { return &v }
+
+// TestResolve_SubstitutesDefaultsForNilPointers verifies that pointer
+// fields left nil after TOML decode (i.e. user didn't set them) get
+// the default value at resolve time. This is the core of the
+// zero-spam fix: the file is allowed to omit the field, and the
+// consumer still sees the default.
+func TestResolve_SubstitutesDefaultsForNilPointers(t *testing.T) {
+	cfg := &Config{} // zero: every pointer is nil
+	resolved := cfg.Resolved()
+
+	if resolved.Provider.MaxTokens != 8192 {
+		t.Errorf("Resolved.Provider.MaxTokens = %d, want 8192 (default)", resolved.Provider.MaxTokens)
+	}
+	if resolved.Tools.MaxFileSize != 1<<20 {
+		t.Errorf("Resolved.Tools.MaxFileSize = %d, want %d (default)", resolved.Tools.MaxFileSize, 1<<20)
+	}
+	if resolved.Security.EntropyThreshold != 4.5 {
+		t.Errorf("Resolved.Security.EntropyThreshold = %v, want 4.5 (default)", resolved.Security.EntropyThreshold)
+	}
+	if resolved.Security.RedactHighEntropy {
+		t.Errorf("Resolved.Security.RedactHighEntropy = true, want false (default)")
+	}
+	if resolved.Router.ForceTwoStage {
+		t.Errorf("Resolved.Router.ForceTwoStage = true, want false (default)")
+	}
+	if resolved.Session.MaxKeep != 20 {
+		t.Errorf("Resolved.Session.MaxKeep = %d, want 20 (default)", resolved.Session.MaxKeep)
+	}
+	if resolved.Router.Prefer != "" {
+		t.Errorf("Resolved.Router.Prefer = %q, want empty (no default)", resolved.Router.Prefer)
+	}
+}
+
+// TestResolve_PreservesExplicitValues verifies that explicit user-set
+// values (non-nil pointers) survive resolution untouched.
+func TestResolve_PreservesExplicitValues(t *testing.T) {
+	cfg := &Config{
+		Provider: ProviderSection{
+			MaxTokens:   i64p(16384),
+			Temperature: fp64(0.7),
+		},
+		Tools: ToolsSection{
+			MaxFileSize: i64p(2 << 20),
+		},
+		Security: SecuritySection{
+			EntropyThreshold:  fp64(5.0),
+			RedactHighEntropy: bp(true),
+		},
+		Router: RouterSection{
+			ForceTwoStage: bp(true),
+			Prefer:        "cloud",
+		},
+		Session: SessionSection{
+			MaxKeep: ip(50),
+		},
+	}
+	resolved := cfg.Resolved()
+	if resolved.Provider.MaxTokens != 16384 {
+		t.Errorf("Resolved.Provider.MaxTokens = %d, want 16384 (user-set)", resolved.Provider.MaxTokens)
+	}
+	if resolved.Tools.MaxFileSize != 2<<20 {
+		t.Errorf("Resolved.Tools.MaxFileSize = %d, want %d (user-set)", resolved.Tools.MaxFileSize, 2<<20)
+	}
+	if resolved.Security.EntropyThreshold != 5.0 {
+		t.Errorf("Resolved.Security.EntropyThreshold = %v, want 5.0 (user-set)", resolved.Security.EntropyThreshold)
+	}
+	if !resolved.Security.RedactHighEntropy {
+		t.Error("Resolved.Security.RedactHighEntropy = false, want true (user-set)")
+	}
+	if !resolved.Router.ForceTwoStage {
+		t.Error("Resolved.Router.ForceTwoStage = false, want true (user-set)")
+	}
+	if resolved.Router.Prefer != "cloud" {
+		t.Errorf("Resolved.Router.Prefer = %q, want cloud (user-set)", resolved.Router.Prefer)
+	}
+	if resolved.Session.MaxKeep != 50 {
+		t.Errorf("Resolved.Session.MaxKeep = %d, want 50 (user-set)", resolved.Session.MaxKeep)
+	}
+}
+
+// TestResolve_ExplicitZeroPreserved verifies that a user who sets
+// `max_tokens = 0` (a *int64 pointing to 0) gets 0 back from the
+// resolver — the pointer is non-nil so the default is not substituted.
+// This is the critical "0 means something the user actually wants"
+// case the pointer conversion exists to preserve.
+func TestResolve_ExplicitZeroPreserved(t *testing.T) {
+	cfg := &Config{
+		Provider: ProviderSection{
+			MaxTokens: i64p(0),
+		},
+		Session: SessionSection{
+			MaxKeep: ip(0),
+		},
+	}
+	resolved := cfg.Resolved()
+	if resolved.Provider.MaxTokens != 0 {
+		t.Errorf("Resolved.Provider.MaxTokens = %d, want 0 (explicit zero)", resolved.Provider.MaxTokens)
+	}
+	if resolved.Session.MaxKeep != 0 {
+		t.Errorf("Resolved.Session.MaxKeep = %d, want 0 (explicit zero)", resolved.Session.MaxKeep)
+	}
+}
+
+// TestResolve_HookFailOpen_NilDefaultsToFalse verifies that a hook
+// with no `fail_open` key gets the documented default (false) in
+// resolution. The HookConfig doc-comment says default is false
+// ("fail closed" / deny-on-error behaviour).
+func TestResolve_HookFailOpen_NilDefaultsToFalse(t *testing.T) {
+	cfg := &Config{
+		Hooks: []HookConfig{
+			{Name: "log-tools", Event: "pre_tool_use", Type: "command", Exec: "/bin/true"},
+		},
+	}
+	resolved := cfg.Resolved()
+	if len(resolved.Hooks) != 1 {
+		t.Fatalf("len(Resolved.Hooks) = %d, want 1", len(resolved.Hooks))
+	}
+	if resolved.Hooks[0].FailOpen {
+		t.Error("Resolved.Hooks[0].FailOpen = true, want false (default)")
+	}
+	if resolved.Hooks[0].Name != "log-tools" {
+		t.Errorf("Resolved.Hooks[0].Name = %q, want log-tools", resolved.Hooks[0].Name)
+	}
+	if resolved.Hooks[0].Exec != "/bin/true" {
+		t.Errorf("Resolved.Hooks[0].Exec = %q, want /bin/true", resolved.Hooks[0].Exec)
+	}
+}
+
+// TestResolve_HookFailOpen_ExplicitTrue verifies that a hook with
+// `fail_open = true` in TOML keeps true in resolution.
+func TestResolve_HookFailOpen_ExplicitTrue(t *testing.T) {
+	cfg := &Config{
+		Hooks: []HookConfig{
+			{Name: "dangerous", Event: "pre_tool_use", Type: "command", Exec: "/bin/true", FailOpen: bp(true)},
+		},
+	}
+	resolved := cfg.Resolved()
+	if !resolved.Hooks[0].FailOpen {
+		t.Error("Resolved.Hooks[0].FailOpen = false, want true (explicit)")
+	}
+}
+
+// TestResolve_NonPointerFieldsPassthrough verifies that string/slice
+// fields on the mirror are passed through from the source Config
+// without default substitution. Only the pointer-converted fields
+// get the resolver treatment; the rest are read directly via cfg.X.
+func TestResolve_NonPointerFieldsPassthrough(t *testing.T) {
+	cfg := &Config{
+		Provider: ProviderSection{
+			Default: "anthropic",
+			Model:   "claude-opus-4-7",
+		},
+		Security: SecuritySection{
+			EntropySafelist: []string{"uuid", "sha_hex"},
+		},
+	}
+	resolved := cfg.Resolved()
+	if resolved.Provider.Default != "anthropic" {
+		t.Errorf("Resolved.Provider.Default = %q, want anthropic", resolved.Provider.Default)
+	}
+	if resolved.Provider.Model != "claude-opus-4-7" {
+		t.Errorf("Resolved.Provider.Model = %q, want claude-opus-4-7", resolved.Provider.Model)
+	}
+	if len(resolved.Security.EntropySafelist) != 2 ||
+		resolved.Security.EntropySafelist[0] != "uuid" {
+		t.Errorf("Resolved.Security.EntropySafelist = %v, want [uuid sha_hex]", resolved.Security.EntropySafelist)
+	}
+}
+
+// TestResolve_SLMSection_StartupTimeoutDefaultsTo5s verifies that
+// the SLM section's pointer-converted Duration fields (added in the
+// 2026-06-04 follow-up to Phase 1) get the documented defaults.
+// StartupTimeout's default is 5s (the llamafile first-launch budget);
+// ClassifyTimeout's default is 0 (which the SLM layer maps to its
+// own 15s budget).
+func TestResolve_SLMSection_StartupTimeoutDefaultsTo5s(t *testing.T) {
+	cfg := &Config{} // every pointer nil
+	resolved := cfg.Resolved()
+
+	if resolved.SLM.StartupTimeout != 5*time.Second {
+		t.Errorf("Resolved.SLM.StartupTimeout = %v, want 5s (default)", resolved.SLM.StartupTimeout)
+	}
+	if resolved.SLM.ClassifyTimeout != 0 {
+		t.Errorf("Resolved.SLM.ClassifyTimeout = %v, want 0 (default — use SLM-layer 15s)", resolved.SLM.ClassifyTimeout)
+	}
+}
+
+// TestResolve_SLMSection_ExplicitDurationsPreserved verifies that
+// user-set Duration values survive resolution untouched.
+func TestResolve_SLMSection_ExplicitDurationsPreserved(t *testing.T) {
+	startup := Duration(30 * time.Second)
+	classify := Duration(45 * time.Second)
+	cfg := &Config{
+		SLM: SLMSection{
+			StartupTimeout:  &startup,
+			ClassifyTimeout: &classify,
+		},
+	}
+	resolved := cfg.Resolved()
+	if resolved.SLM.StartupTimeout != 30*time.Second {
+		t.Errorf("Resolved.SLM.StartupTimeout = %v, want 30s (user-set)", resolved.SLM.StartupTimeout)
+	}
+	if resolved.SLM.ClassifyTimeout != 45*time.Second {
+		t.Errorf("Resolved.SLM.ClassifyTimeout = %v, want 45s (user-set)", resolved.SLM.ClassifyTimeout)
+	}
+}
+
+// TestResolve_SLMSection_ExplicitZeroPreserved verifies that
+// *Duration(0) (the documented "use built-in default" sentinel for
+// both fields) is preserved as 0 in the resolved view.
+func TestResolve_SLMSection_ExplicitZeroPreserved(t *testing.T) {
+	startup := Duration(0)
+	classify := Duration(0)
+	cfg := &Config{
+		SLM: SLMSection{
+			StartupTimeout:  &startup,
+			ClassifyTimeout: &classify,
+		},
+	}
+	resolved := cfg.Resolved()
+	if resolved.SLM.StartupTimeout != 0 {
+		t.Errorf("Resolved.SLM.StartupTimeout = %v, want 0 (explicit zero)", resolved.SLM.StartupTimeout)
+	}
+	if resolved.SLM.ClassifyTimeout != 0 {
+		t.Errorf("Resolved.SLM.ClassifyTimeout = %v, want 0 (explicit zero)", resolved.SLM.ClassifyTimeout)
+	}
+}
+
+// TestResolve_ProjectRegistryDefaultsToTrue verifies the
+// Phase 2 mirror: nil pointer → default (true, registry
+// enabled). Preserves the v0.3.x "always record" behavior.
+func TestResolve_ProjectRegistryDefaultsToTrue(t *testing.T) {
+	cfg := &Config{}
+	resolved := cfg.Resolved()
+	if !resolved.ProjectRegistry {
+		t.Errorf("Resolved.ProjectRegistry = false, want true (default)")
+	}
+}
+
+// TestResolve_ProjectRegistry_ExplicitFalse verifies that a
+// user who sets `[config].project_registry = false` gets
+// false in the resolved view.
+func TestResolve_ProjectRegistry_ExplicitFalse(t *testing.T) {
+	v := false
+	cfg := &Config{
+		Settings: SettingsSection{ProjectRegistry: &v},
+	}
+	resolved := cfg.Resolved()
+	if resolved.ProjectRegistry {
+		t.Errorf("Resolved.ProjectRegistry = true, want false (explicit opt-out)")
+	}
+}
@@ -0,0 +1,298 @@
+package config
+
+import (
+	"bytes"
+	"fmt"
+	"os"
+	"path/filepath"
+	"time"
+
+	"github.com/BurntSushi/toml"
+)
+
+// UpgradeResult is what Upgrade returns: a description of what
+// changed, plus a human-readable diff the CLI can print for the
+// user to verify. BackupPath is empty when no work was done.
+type UpgradeResult struct {
+	Changed    bool
+	BackupPath string
+	Diff       string
+}
+
+// Upgrade reads the config at path, applies the cleaning pass
+// (drops fields whose value matches the resolved default, leaves
+// explicit-zero pointer fields alone), and atomically writes the
+// cleaned form to the same path. The original is preserved at
+// `<path>.bak-YYYYMMDD-HHMMSS`.
+//
+// Single-file mode only — `--all-projects` is deferred to the
+// Phase 2 project registry work in the 2026-05-24 config-
+// migration plan.
+//
+// The cleaning rules per field type:
+//
+//   - Pointer-converted fields: drop (set to nil) iff the
+//     resolved value equals the resolved default. Explicit-zero
+//     pointer values that differ from the default are kept.
+//
+//   - Non-pointer string / map / slice fields: encoder's
+//     `omitempty` already drops Go-zero values on rewrite. The
+//     cleaner doesn't need to touch them.
+//
+//   - Non-pointer numeric / bool fields: same as non-pointer
+//     string — encoder drops Go-zero via `omitempty`. The
+//     documented 0-sentinel pattern (e.g. `TUI.Vim`, `Bandit`)
+//     intentionally has Go zero == default, so this is correct.
+//
+// The contract: the resolved view of the cleaned file is
+// byte-identical to the resolved view of the original (modulo
+// cosmetic whitespace). Idempotency test in upgrade_test.go
+// asserts this.
+func Upgrade(path string) (UpgradeResult, error) {
+	original, err := os.ReadFile(path)
+	if err != nil {
+		return UpgradeResult{}, fmt.Errorf("read config: %w", err)
+	}
+
+	var src Config
+	if _, decErr := toml.Decode(string(original), &src); decErr != nil {
+		return UpgradeResult{}, fmt.Errorf("decode config: %w", decErr)
+	}
+
+	// Encode the *original* (uncleaned) state for diff/compare
+	// BEFORE clean() mutates the struct in place.
+	var beforeBuf bytes.Buffer
+	if err := toml.NewEncoder(&beforeBuf).Encode(&src); err != nil {
+		return UpgradeResult{}, fmt.Errorf("encode before: %w", err)
+	}
+
+	clean(&src)
+
+	// Encode the cleaned state.
+	var afterBuf bytes.Buffer
+	if err := toml.NewEncoder(&afterBuf).Encode(&src); err != nil {
+		return UpgradeResult{}, fmt.Errorf("encode after: %w", err)
+	}
+	before := beforeBuf.Bytes()
+	after := afterBuf.Bytes()
+
+	if bytes.Equal(before, after) {
+		return UpgradeResult{Changed: false}, nil
+	}
+
+	// Atomic two-step write: rename original to .bak-<timestamp>,
+	// then atomic-write the new content to the original path. If
+	// the rename fails or the new write fails, the original is
+	// preserved on disk (we never delete it before the new
+	// content is durably committed).
+	backupPath, err := backupPathFor(path)
+	if err != nil {
+		return UpgradeResult{}, err
+	}
+	if err := os.Rename(path, backupPath); err != nil {
+		return UpgradeResult{}, fmt.Errorf("rename original to backup: %w", err)
+	}
+	if err := writeAtomicBytes(path, after); err != nil {
+		// Best-effort restore: the original is at backupPath,
+		// the user can recover. But the rename already moved it,
+		// so the canonical path is gone. Try to put the backup
+		// back so the user's config isn't lost.
+		_ = os.Rename(backupPath, path)
+		return UpgradeResult{}, fmt.Errorf("write cleaned config: %w", err)
+	}
+
+	return UpgradeResult{
+		Changed:    true,
+		BackupPath: backupPath,
+		Diff:       lineDiff(string(before), string(after)),
+	}, nil
+}
+
+// clean returns a new Config with pointer-converted fields
+// nulled where the value matches the resolved default. Non-
+// pointer fields are passed through unchanged — the encoder's
+// `omitempty` handles their Go-zero cases on write.
+//
+// `clean` mutates *Config.X by setting it to nil for fields
+// that match the default. It does not allocate a fresh Config
+// because the pointer fields reference shared memory between
+// sections (e.g. `cfg.Provider.MaxTokens` and
+// `Defaults().Provider.MaxTokens` are both *int64). Returning
+// the same struct with selective nulling keeps the data flow
+// obvious.
+func clean(cfg *Config) *Config {
+	d := Defaults()
+	resolvedSrc := cfg.Resolved()
+	resolvedDef := d.Resolved()
+
+	// Provider.MaxTokens
+	if cfg.Provider.MaxTokens != nil && resolvedSrc.Provider.MaxTokens == resolvedDef.Provider.MaxTokens {
+		cfg.Provider.MaxTokens = nil
+	}
+
+	// Tools.MaxFileSize
+	if cfg.Tools.MaxFileSize != nil && resolvedSrc.Tools.MaxFileSize == resolvedDef.Tools.MaxFileSize {
+		cfg.Tools.MaxFileSize = nil
+	}
+
+	// Security.EntropyThreshold
+	if cfg.Security.EntropyThreshold != nil && resolvedSrc.Security.EntropyThreshold == resolvedDef.Security.EntropyThreshold {
+		cfg.Security.EntropyThreshold = nil
+	}
+	// Security.RedactHighEntropy
+	if cfg.Security.RedactHighEntropy != nil && resolvedSrc.Security.RedactHighEntropy == resolvedDef.Security.RedactHighEntropy {
+		cfg.Security.RedactHighEntropy = nil
+	}
+
+	// Router.ForceTwoStage
+	if cfg.Router.ForceTwoStage != nil && resolvedSrc.Router.ForceTwoStage == resolvedDef.Router.ForceTwoStage {
+		cfg.Router.ForceTwoStage = nil
+	}
+
+	// Session.MaxKeep
+	if cfg.Session.MaxKeep != nil && resolvedSrc.Session.MaxKeep == resolvedDef.Session.MaxKeep {
+		cfg.Session.MaxKeep = nil
+	}
+
+	// SLM.StartupTimeout / SLM.ClassifyTimeout
+	if cfg.SLM.StartupTimeout != nil && resolvedSrc.SLM.StartupTimeout == resolvedDef.SLM.StartupTimeout {
+		cfg.SLM.StartupTimeout = nil
+	}
+	if cfg.SLM.ClassifyTimeout != nil && resolvedSrc.SLM.ClassifyTimeout == resolvedDef.SLM.ClassifyTimeout {
+		cfg.SLM.ClassifyTimeout = nil
+	}
+	// SLM.RegisterAsArm: default is true; only null when
+	// explicitly set to true (the default-true case).
+	if cfg.SLM.RegisterAsArm != nil && *cfg.SLM.RegisterAsArm == resolvedDef.SLM.RegisterAsArm {
+		cfg.SLM.RegisterAsArm = nil
+	}
+
+	// HookConfig.FailOpen per entry
+	for i := range cfg.Hooks {
+		if cfg.Hooks[i].FailOpen != nil && !resolvedSrc.Hooks[i].FailOpen {
+			// Default for FailOpen is false; null when explicitly false.
+			cfg.Hooks[i].FailOpen = nil
+		}
+	}
+
+	return cfg
+}
+
+// backupPathFor returns a deterministic timestamped backup path.
+// Uses the local-time YYYYMMDD-HHMMSS format the original plan
+// specified, with second-level resolution. Collisions within the
+// same second are possible (e.g. rapid re-runs) but the
+// idempotency test exercises the no-second-backup case, so a
+// collision would still be visible to the user.
+func backupPathFor(path string) (string, error) {
+	t := time.Now()
+	suffix := t.Format("20060102-150405")
+	return fmt.Sprintf("%s.bak-%s", path, suffix), nil
+}
+
+// writeAtomicBytes writes the given bytes to path via temp file
+// + rename. Used by Upgrade (which has already produced the
+// bytes) and is a more general version of writeAtomicTOML.
+func writeAtomicBytes(path string, data []byte) error {
+	dir := filepath.Dir(path)
+	tmp, err := os.CreateTemp(dir, filepath.Base(path)+".tmp-*")
+	if err != nil {
+		return fmt.Errorf("create temp: %w", err)
+	}
+	tmpName := tmp.Name()
+	cleanup := func() { _ = os.Remove(tmpName) }
+
+	if _, err := tmp.Write(data); err != nil {
+		_ = tmp.Close()
+		cleanup()
+		return fmt.Errorf("write temp: %w", err)
+	}
+	if err := tmp.Sync(); err != nil {
+		_ = tmp.Close()
+		cleanup()
+		return fmt.Errorf("sync temp: %w", err)
+	}
+	if err := tmp.Close(); err != nil {
+		cleanup()
+		return fmt.Errorf("close temp: %w", err)
+	}
+	if err := os.Rename(tmpName, path); err != nil {
+		cleanup()
+		return fmt.Errorf("rename temp: %w", err)
+	}
+	return nil
+}
+
+// lineDiff returns a simple line-by-line diff between before and
+// after. Lines removed from before are prefixed with `-`, lines
+// added in after are prefixed with `+`, unchanged lines are
+// prefixed with ` ` (space). Header lines give the file lengths.
+//
+// Not a true Myers / Hunt–Szymanski diff — a long edit can
+// produce noisy output. Adequate for the gnoma use case where
+// config files are small (tens of lines) and the user wants
+// visual confirmation that the cleaning is doing the right
+// thing. If a more sophisticated diff is ever needed,
+// `github.com/pmezard/go-difflib` is already a transitive dep
+// (see go.sum) and can be vendored.
+func lineDiff(before, after string) string {
+	var b bytes.Buffer
+	b.WriteString(fmt.Sprintf("--- before (%d bytes)\n", len(before)))
+	b.WriteString(fmt.Sprintf("+++ after  (%d bytes)\n", len(after)))
+	bs := splitLines(before)
+	as := splitLines(after)
+
+	// Naive: walk both, mark removed/added/changed. We do a
+	// simple longest-common-subsequence via a small set, since
+	// config files are small. For each line in before, find
+	// the first matching line in after; emit `-` for the
+	// unmatched prefix and `+` for the new prefix.
+	i, j := 0, 0
+	for i < len(bs) || j < len(as) {
+		switch {
+		case i < len(bs) && j < len(as) && bs[i] == as[j]:
+			fmt.Fprintf(&b, " %s\n", bs[i])
+			i++
+			j++
+		case j < len(as) && (i == len(bs) || !contains(bs[i:], as[j])):
+			fmt.Fprintf(&b, "+ %s\n", as[j])
+			j++
+		case i < len(bs):
+			fmt.Fprintf(&b, "- %s\n", bs[i])
+			i++
+		}
+	}
+	return b.String()
+}
+
+// splitLines returns the lines of s, including any trailing
+// empty line if s ends in '\n'. The result is suitable for
+// line-by-line diffing.
+func splitLines(s string) []string {
+	if s == "" {
+		return nil
+	}
+	out := []string{}
+	start := 0
+	for i := 0; i < len(s); i++ {
+		if s[i] == '\n' {
+			out = append(out, s[start:i])
+			start = i + 1
+		}
+	}
+	if start < len(s) {
+		out = append(out, s[start:])
+	}
+	return out
+}
+
+// contains reports whether v appears in s. Used by lineDiff to
+// detect a "moved" line.
+func contains(s []string, v string) bool {
+	for _, x := range s {
+		if x == v {
+			return true
+		}
+	}
+	return false
+}
@@ -0,0 +1,309 @@
+package config
+
+import (
+	"os"
+	"path/filepath"
+	"strings"
+	"testing"
+	"time"
+)
+
+// TestUpgrade_DropsPointerFieldAtDefault verifies the core
+// cleaning semantic for pointer-converted fields: a file
+// containing `max_tokens = 8192` (the documented default, user
+// explicitly set to it) gets the field nulled in the rewritten
+// file. The cleaner compares resolved values; matching the
+// default means the field is dropped.
+//
+// Non-pointer string fields (like `mode = ""`) are dropped
+// automatically by the encoder's `omitempty` on the
+// read+rewrite cycle, so they don't need the cleaner's help.
+// This test focuses on the pointer-converted case that the
+// cleaner was designed for.
+func TestUpgrade_DropsPointerFieldAtDefault(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "config.toml")
+
+	original := "[provider]\nmax_tokens = 8192\n"
+	if err := os.WriteFile(path, []byte(original), 0o644); err != nil {
+		t.Fatalf("seed: %v", err)
+	}
+
+	res, err := Upgrade(path)
+	if err != nil {
+		t.Fatalf("Upgrade: %v", err)
+	}
+	if !res.Changed {
+		t.Errorf("Upgrade.Changed = false, want true (max_tokens at default should be dropped)")
+	}
+
+	got, err := os.ReadFile(path)
+	if err != nil {
+		t.Fatalf("read upgraded: %v", err)
+	}
+	body := string(got)
+
+	if strings.Contains(body, "max_tokens") {
+		t.Errorf("max_tokens at default not dropped, got:\n%s", body)
+	}
+	if strings.Contains(body, "[provider]") {
+		t.Errorf("[provider] block should be omitted after cleaning, got:\n%s", body)
+	}
+}
+
+// TestUpgrade_KeepsExplicitUserValues verifies that user-set
+// non-default values survive the cleaning untouched.
+func TestUpgrade_KeepsExplicitUserValues(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "config.toml")
+
+	original := `[provider]
+default = "anthropic"
+max_tokens = 16384
+
+[permission]
+mode = "deny"
+`
+	if err := os.WriteFile(path, []byte(original), 0o644); err != nil {
+		t.Fatalf("seed: %v", err)
+	}
+
+	if _, err := Upgrade(path); err != nil {
+		t.Fatalf("Upgrade: %v", err)
+	}
+
+	got, _ := os.ReadFile(path)
+	body := string(got)
+
+	for _, want := range []string{
+		`default = "anthropic"`,
+		`max_tokens = 16384`,
+		`mode = "deny"`,
+	} {
+		if !strings.Contains(body, want) {
+			t.Errorf("cleaned file missing %q, got:\n%s", want, body)
+		}
+	}
+}
+
+// TestUpgrade_KeepsExplicitZeroPointerFields verifies the
+// pointer-conversion contract: a user who sets `*int64(0)`
+// explicitly (resolved to 0, which differs from the default
+// 8192) keeps the field in the cleaned file. This is the
+// "explicit zero preserved" case the Phase 1 hybrid exists for.
+func TestUpgrade_KeepsExplicitZeroPointerFields(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "config.toml")
+
+	original := `[provider]
+max_tokens = 0
+`
+	if err := os.WriteFile(path, []byte(original), 0o644); err != nil {
+		t.Fatalf("seed: %v", err)
+	}
+
+	if _, err := Upgrade(path); err != nil {
+		t.Fatalf("Upgrade: %v", err)
+	}
+
+	got, _ := os.ReadFile(path)
+	body := string(got)
+
+	if !strings.Contains(body, "max_tokens = 0") {
+		t.Errorf("explicit zero max_tokens = 0 was dropped, got:\n%s", body)
+	}
+}
+
+// TestUpgrade_BackupFileCreated verifies the atomic two-step
+// write: the original is renamed to `<path>.bak-YYYYMMDD-HHMMSS`
+// and the cleaned content lands at the original path. The
+// timestamp suffix is deterministic enough to pattern-match.
+func TestUpgrade_BackupFileCreated(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "config.toml")
+
+	// Use a pointer-converted field at the default so the cleaner
+	// actually mutates the struct (and Changed becomes true).
+	original := "[provider]\nmax_tokens = 8192\n"
+	if err := os.WriteFile(path, []byte(original), 0o644); err != nil {
+		t.Fatalf("seed: %v", err)
+	}
+
+	res, err := Upgrade(path)
+	if err != nil {
+		t.Fatalf("Upgrade: %v", err)
+	}
+	if !res.Changed {
+		t.Skip("no change, can't test backup creation")
+	}
+	if res.BackupPath == "" {
+		t.Errorf("Upgrade.BackupPath = empty, want non-empty")
+	}
+	if !strings.HasPrefix(res.BackupPath, path+".bak-") {
+		t.Errorf("BackupPath = %q, want prefix %q", res.BackupPath, path+".bak-")
+	}
+	backup, err := os.ReadFile(res.BackupPath)
+	if err != nil {
+		t.Fatalf("read backup: %v", err)
+	}
+	if string(backup) != original {
+		t.Errorf("backup content = %q, want %q", backup, original)
+	}
+}
+
+// TestUpgrade_Idempotent verifies the core promise: running
+// upgrade twice on the same file produces a no-op the second
+// time. No second backup is created; the file content is
+// unchanged; the result reports Changed=false on the second run.
+func TestUpgrade_Idempotent(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "config.toml")
+
+	// Mix: one explicit user value (default = "anthropic") and
+	// one pointer-converted field at the default (max_tokens = 8192).
+	// The cleaner drops the max_tokens; the user value is kept.
+	original := "[provider]\ndefault = \"anthropic\"\nmax_tokens = 8192\n"
+	if err := os.WriteFile(path, []byte(original), 0o644); err != nil {
+		t.Fatalf("seed: %v", err)
+	}
+
+	first, err := Upgrade(path)
+	if err != nil {
+		t.Fatalf("first Upgrade: %v", err)
+	}
+	if !first.Changed {
+		t.Errorf("first Upgrade.Changed = false, want true")
+	}
+
+	second, err := Upgrade(path)
+	if err != nil {
+		t.Fatalf("second Upgrade: %v", err)
+	}
+	if second.Changed {
+		t.Errorf("second Upgrade.Changed = true, want false (idempotent)")
+	}
+	if second.BackupPath != "" {
+		t.Errorf("second Upgrade.BackupPath = %q, want empty (no second backup)", second.BackupPath)
+	}
+}
+
+// TestUpgrade_NoChangesOnAlreadyCleanFile verifies the no-op
+// case: a file that already has only user-set non-default
+// values produces Changed=false and no backup. This is the
+// baseline — the user runs upgrade-config and gets told
+// "nothing to do".
+func TestUpgrade_NoChangesOnAlreadyCleanFile(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "config.toml")
+
+	clean := "[provider]\ndefault = \"anthropic\"\n"
+	if err := os.WriteFile(path, []byte(clean), 0o644); err != nil {
+		t.Fatalf("seed: %v", err)
+	}
+
+	res, err := Upgrade(path)
+	if err != nil {
+		t.Fatalf("Upgrade: %v", err)
+	}
+	if res.Changed {
+		t.Errorf("Upgrade.Changed = true on already-clean file")
+	}
+	if res.BackupPath != "" {
+		t.Errorf("Upgrade.BackupPath = %q, want empty", res.BackupPath)
+	}
+}
+
+// TestUpgrade_DiffPopulatedWhenChanged verifies the human-readable
+// diff is populated whenever the file changed. CLI prints this
+// for the user to verify the cleaning is doing the right thing.
+func TestUpgrade_DiffPopulatedWhenChanged(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "config.toml")
+
+	// Use a pointer-converted field at the default so Changed=true.
+	if err := os.WriteFile(path, []byte("[provider]\nmax_tokens = 8192\n"), 0o644); err != nil {
+		t.Fatalf("seed: %v", err)
+	}
+
+	res, err := Upgrade(path)
+	if err != nil {
+		t.Fatalf("Upgrade: %v", err)
+	}
+	if !res.Changed {
+		t.Skip("no change, can't test diff content")
+	}
+	if res.Diff == "" {
+		t.Errorf("Upgrade.Diff = empty, want non-empty when Changed=true")
+	}
+	if !strings.Contains(res.Diff, "max_tokens") {
+		t.Errorf("Diff does not mention the changed field, got:\n%s", res.Diff)
+	}
+}
+
+// TestUpgrade_PreservesDurationFields verifies the
+// 2026-06-04 Caveat 1 fix interacts correctly with the cleaner:
+// a user-set Duration (e.g. classify_timeout = "20s") is kept
+// because it's not the default (the default is *Duration(0) for
+// ClassifyTimeout, mapped to time.Duration(0) at the resolver).
+func TestUpgrade_PreservesDurationFields(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "config.toml")
+
+	original := "[slm]\nclassify_timeout = \"20s\"\n"
+	if err := os.WriteFile(path, []byte(original), 0o644); err != nil {
+		t.Fatalf("seed: %v", err)
+	}
+
+	if _, err := Upgrade(path); err != nil {
+		t.Fatalf("Upgrade: %v", err)
+	}
+
+	got, _ := os.ReadFile(path)
+	body := string(got)
+
+	if !strings.Contains(body, "classify_timeout") {
+		t.Errorf("user-set Duration was dropped, got:\n%s", body)
+	}
+}
+
+// TestUpgrade_KeepsExplicitZeroDuration documents the *opposite*
+// of the "drops" cases: a file with `startup_timeout = 0` (the
+// previous zero-spam from the pre-Caveat-1 int64 encoder) is
+// KEPT, because the resolved value via *Duration is 0 which
+// differs from the documented default of 5s. The user's
+// explicit-zero is preserved — this is the "explicit zero"
+// contract the pointer-conversion exists for.
+func TestUpgrade_KeepsExplicitZeroDuration(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "config.toml")
+
+	original := "[slm]\nstartup_timeout = 0\n"
+	if err := os.WriteFile(path, []byte(original), 0o644); err != nil {
+		t.Fatalf("seed: %v", err)
+	}
+
+	if _, err := Upgrade(path); err != nil {
+		t.Fatalf("Upgrade: %v", err)
+	}
+
+	got, _ := os.ReadFile(path)
+	body := string(got)
+
+	if !strings.Contains(body, "startup_timeout") {
+		t.Errorf("startup_timeout was dropped (expected kept; resolved 0 != default 5s), got:\n%s", body)
+	}
+	_ = time.Second
+}
+
+// TestUpgrade_NonexistentFileIsError verifies the input-validation
+// path. A missing source file is a user error, not a silent
+// success.
+func TestUpgrade_NonexistentFileIsError(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "nonexistent.toml")
+
+	_, err := Upgrade(path)
+	if err == nil {
+		t.Fatal("Upgrade on missing file succeeded, want error")
+	}
+}
@@ -22,24 +22,33 @@ func SetGlobalConfig(key, value string) error {
 }

 func setConfig(path, key, value string) error {
-	allowed := map[string]bool{
-		"provider.default": true,
-		"provider.model":   true,
-		"permission.mode":  true,
-		"slm.model_url":    true,
-		"slm.enabled":      true,
-		"slm.data_dir":     true,
-		"tui.theme":        true,
-		"tui.vim":          true,
-	}
-	if !allowed[key] {
-		return fmt.Errorf("unknown config key %q (supported: %s)", key, strings.Join(allowedKeys(), ", "))
+	if !isAllowedKey(key) {
+		return fmt.Errorf("unknown config key %q (supported: %s)", key, strings.Join(AllowedKeys(), ", "))
 	}

-	// Load existing config or start fresh
+	// Ensure directory exists before the read so a fresh project
+	// can be created without a parent .gnoma/ in place.
+	if err := os.MkdirAll(filepath.Dir(path), 0o755); err != nil {
+		return fmt.Errorf("create config dir: %w", err)
+	}
+
+	// Read existing config into a zero Config; decode overlays
+	// whatever the user has set so the round-trip preserves their
+	// values. Pointer-converted fields decode as `nil` when the key
+	// is absent and as `*T(...)` when present; omitempty on the
+	// encoder keeps absent fields out of the rewritten file. This
+	// is the fix for the zero-spam silent-corruption bug: a fresh
+	// setConfig call no longer emits the entire zero-valued struct.
 	var cfg Config
 	if data, err := os.ReadFile(path); err == nil {
-		toml.Decode(string(data), &cfg) //nolint:errcheck
+		if _, decErr := toml.Decode(string(data), &cfg); decErr != nil {
+			// Existing file is broken; overwrite it with the
+			// caller's change rather than failing closed. The
+			// user's intent for the broken file is "set this
+			// key" — preserving every other corrupt line is
+			// less useful than a clean write.
+			cfg = Config{}
+		}
 	}
 	if cfg.Provider.APIKeys == nil {
 		cfg.Provider.APIKeys = make(map[string]string)
@@ -68,29 +77,58 @@ func setConfig(path, key, value string) error {
 		cfg.TUI.Vim = value == "true"
 	}

-	// Ensure directory exists
-	if err := os.MkdirAll(filepath.Dir(path), 0o755); err != nil {
-		return fmt.Errorf("create config dir: %w", err)
-	}
+	return writeAtomicTOML(path, cfg)
+}

-	// Write
-	f, err := os.Create(path)
+// writeAtomicTOML writes cfg to path via temp-file + rename so a
+// crash mid-write can never leave a half-written config file at
+// the canonical path. The temp file lives in the same directory
+// (so the rename is on the same filesystem) and uses a .tmp-*
+// suffix that any other reader will skip.
+func writeAtomicTOML(path string, cfg Config) error {
+	dir := filepath.Dir(path)
+	tmp, err := os.CreateTemp(dir, filepath.Base(path)+".tmp-*")
 	if err != nil {
-		return fmt.Errorf("create config file: %w", err)
+		return fmt.Errorf("create temp config file: %w", err)
 	}
-	enc := toml.NewEncoder(f)
-	encErr := enc.Encode(cfg)
-	closeErr := f.Close()
-	if encErr != nil {
-		return encErr
+	tmpName := tmp.Name()
+	cleanup := func() { _ = os.Remove(tmpName) }
+
+	enc := toml.NewEncoder(tmp)
+	if encErr := enc.Encode(cfg); encErr != nil {
+		_ = tmp.Close()
+		cleanup()
+		return fmt.Errorf("encode config: %w", encErr)
 	}
-	if closeErr != nil {
-		return fmt.Errorf("close config file: %w", closeErr)
+	if err := tmp.Sync(); err != nil {
+		_ = tmp.Close()
+		cleanup()
+		return fmt.Errorf("sync config: %w", err)
+	}
+	if err := tmp.Close(); err != nil {
+		cleanup()
+		return fmt.Errorf("close temp config: %w", err)
+	}
+	if err := os.Rename(tmpName, path); err != nil {
+		cleanup()
+		return fmt.Errorf("rename temp config: %w", err)
 	}
 	return nil
 }

-func allowedKeys() []string {
+func isAllowedKey(key string) bool {
+	for _, k := range AllowedKeys() {
+		if k == key {
+			return true
+		}
+	}
+	return false
+}
+
+// AllowedKeys returns the list of dotted config keys that
+// `gnoma config set` accepts. Exported so the CLI subcommand can
+// present the same list in its help text and validation.
+func AllowedKeys() []string {
 	return []string{
 		"provider.default", "provider.model", "permission.mode",
 		"slm.model_url", "slm.enabled", "slm.data_dir",
@@ -0,0 +1,200 @@
+package config
+
+import (
+	"os"
+	"path/filepath"
+	"strings"
+	"testing"
+)
+
+// TestSetProjectConfig_FreshFileWritesOnlyTheKey verifies the core
+// fix: a `setConfig` call on a non-existent file writes ONLY the
+// key the user is setting, with no zero-spam. This is what stops
+// `gnoma config set provider.default anthropic` from emitting
+// `permission.mode = ""` and silently shadowing a global setting.
+//
+// Regression test for the 2026-05-24 silent-corruption symptom.
+func TestSetProjectConfig_FreshFileWritesOnlyTheKey(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "config.toml")
+
+	if err := setConfig(path, "provider.default", "anthropic"); err != nil {
+		t.Fatalf("setConfig: %v", err)
+	}
+
+	data, err := os.ReadFile(path)
+	if err != nil {
+		t.Fatalf("read result: %v", err)
+	}
+	body := string(data)
+
+	if !strings.Contains(body, "default = \"anthropic\"") {
+		t.Errorf("result missing the set value, got:\n%s", body)
+	}
+	if strings.Contains(body, "permission") {
+		t.Errorf("result contains [permission] zero-spam, got:\n%s", body)
+	}
+	if strings.Contains(body, "mode") {
+		t.Errorf("result contains 'mode' key (likely zero-spam), got:\n%s", body)
+	}
+	if strings.Contains(body, "max_tokens") {
+		t.Errorf("result contains 'max_tokens' (zero-spam from non-pointer default), got:\n%s", body)
+	}
+}
+
+// TestSetProjectConfig_RoundTripPreservesUserValues verifies that
+// the user's previously-set values survive a second `setConfig` call.
+// The encoder doesn't drop fields that were in the source.
+func TestSetProjectConfig_RoundTripPreservesUserValues(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "config.toml")
+
+	if err := setConfig(path, "permission.mode", "deny"); err != nil {
+		t.Fatalf("first setConfig: %v", err)
+	}
+	if err := setConfig(path, "provider.default", "anthropic"); err != nil {
+		t.Fatalf("second setConfig: %v", err)
+	}
+
+	data, _ := os.ReadFile(path)
+	body := string(data)
+
+	if !strings.Contains(body, "default = \"anthropic\"") {
+		t.Errorf("second setConfig lost the new value, got:\n%s", body)
+	}
+	if !strings.Contains(body, "mode = \"deny\"") {
+		t.Errorf("second setConfig lost the prior permission.mode, got:\n%s", body)
+	}
+}
+
+// TestSetProjectConfig_ReplacesZeroSpamForSetField verifies the
+// user-recovery path: a file already polluted with `mode = ""`
+// zero-spam gets corrected when the user re-sets that key.
+func TestSetProjectConfig_ReplacesZeroSpamForSetField(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "config.toml")
+
+	// Pre-populate with a zero-spammed value.
+	if err := os.WriteFile(path, []byte("[permission]\nmode = \"\"\n"), 0o644); err != nil {
+		t.Fatalf("seed: %v", err)
+	}
+
+	if err := setConfig(path, "permission.mode", "auto"); err != nil {
+		t.Fatalf("setConfig: %v", err)
+	}
+
+	data, _ := os.ReadFile(path)
+	body := string(data)
+
+	if strings.Contains(body, "mode = \"\"") {
+		t.Errorf("zero-spam mode=\"\" not replaced, got:\n%s", body)
+	}
+	if !strings.Contains(body, "mode = \"auto\"") {
+		t.Errorf("new value not present, got:\n%s", body)
+	}
+}
+
+// TestSetProjectConfig_RejectsUnknownKey verifies the allowlist
+// guard. Unknown keys must error, not silently no-op.
+func TestSetProjectConfig_RejectsUnknownKey(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "config.toml")
+
+	err := setConfig(path, "not.a.real.key", "x")
+	if err == nil {
+		t.Fatal("expected error for unknown key, got nil")
+	}
+	if !strings.Contains(err.Error(), "unknown config key") {
+		t.Errorf("error %q does not name the bad key", err)
+	}
+	if _, statErr := os.Stat(path); !os.IsNotExist(statErr) {
+		t.Errorf("file was created on rejection: stat err = %v", statErr)
+	}
+}
+
+// TestSetProjectConfig_AtomicWriteLeavesNoTempFile verifies that
+// the write is atomic: after a successful call, no .tmp or similar
+// file remains in the config directory.
+func TestSetProjectConfig_AtomicWriteLeavesNoTempFile(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "config.toml")
+
+	if err := setConfig(path, "tui.theme", "dracula"); err != nil {
+		t.Fatalf("setConfig: %v", err)
+	}
+
+	entries, err := os.ReadDir(dir)
+	if err != nil {
+		t.Fatalf("ReadDir: %v", err)
+	}
+	for _, e := range entries {
+		if e.Name() != "config.toml" {
+			t.Errorf("unexpected leftover file: %q", e.Name())
+		}
+	}
+}
+
+// TestSetProjectConfig_OmitsEmptyStringField verifies the omitempty
+// fix at the field level: setting a string field to "" does not
+// emit the field. This is the layer that stops a user setting
+// `tui.theme = ""` (or any other empty string) from re-introducing
+// zero-spam.
+func TestSetProjectConfig_OmitsEmptyStringField(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "config.toml")
+
+	// tui.theme is whitelisted; setting to empty should be a no-op
+	// on the file's emitted content (or at most, not write the
+	// theme line).
+	if err := setConfig(path, "tui.theme", ""); err != nil {
+		t.Fatalf("setConfig: %v", err)
+	}
+	data, _ := os.ReadFile(path)
+	body := string(data)
+	if strings.Contains(body, "theme") {
+		t.Errorf("empty theme still emitted, got:\n%s", body)
+	}
+}
+
+// TestSetProjectConfig_SetsBoolFieldCorrectly verifies that the
+// whitelisted `tui.vim` boolean (kept as a non-pointer bool per
+// the plan — the default-equals-false case where the encoder can
+// skip without losing user intent) round-trips for the `true`
+// case. The `false` case is the Go zero value, so omitempty drops
+// it — which matches the user's effective intent.
+func TestSetProjectConfig_SetsBoolFieldCorrectly(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "config.toml")
+
+	if err := setConfig(path, "tui.vim", "true"); err != nil {
+		t.Fatalf("setConfig: %v", err)
+	}
+	data, _ := os.ReadFile(path)
+	if !strings.Contains(string(data), "vim = true") {
+		t.Errorf("vim=true not present, got:\n%s", data)
+	}
+}
+
+// TestSetProjectConfig_SLMEnabledOmitsDurationFields verifies the
+// 2026-06-04 follow-up fix: setting `slm.enabled = true` on a
+// fresh file no longer emits `startup_timeout = 0` or
+// `classify_timeout = 0` zero-spam. Both Duration fields are
+// pointer-converted (`*Duration`) so the encoder honors
+// `omitempty` when the pointer is nil.
+func TestSetProjectConfig_SLMEnabledOmitsDurationFields(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "config.toml")
+
+	if err := setConfig(path, "slm.enabled", "true"); err != nil {
+		t.Fatalf("setConfig: %v", err)
+	}
+	data, _ := os.ReadFile(path)
+	body := string(data)
+
+	if strings.Contains(body, "startup_timeout") {
+		t.Errorf("startup_timeout emitted as zero-spam, got:\n%s", body)
+	}
+	if strings.Contains(body, "classify_timeout") {
+		t.Errorf("classify_timeout emitted as zero-spam, got:\n%s", body)
+	}
+}
@@ -49,7 +49,7 @@ func ParseHookDefs(cfgs []config.HookConfig) ([]HookDef, error) {
 			Command:     cmd,
 			Exec:        c.Exec,
 			Timeout:     timeout,
-			FailOpen:    c.FailOpen,
+			FailOpen:    c.FailOpen != nil && *c.FailOpen,
 			ToolPattern: toolPattern,
 		}
 		if err := def.Validate(); err != nil {
@@ -8,6 +8,7 @@ import (
 )

 func TestParseHookDefs_ValidConfig(t *testing.T) {
+	failOpen := true
 	cfgs := []config.HookConfig{
 		{
 			Name:        "log-tools",
@@ -15,7 +16,7 @@ func TestParseHookDefs_ValidConfig(t *testing.T) {
 			Type:        "command",
 			Exec:        "tee -a /tmp/log.jsonl",
 			Timeout:     "5s",
-			FailOpen:    true,
+			FailOpen:    &failOpen,
 			ToolPattern: "bash*",
 		},
 	}
@@ -105,13 +105,18 @@ func (l *Loader) Load(plugins []Plugin, enabledSet map[string]bool, pins PinStor
 			if execPath != "" && !filepath.IsAbs(execPath) {
 				execPath = filepath.Join(p.Dir, execPath)
 			}
+			var failOpen *bool
+			if h.FailOpen {
+				v := true
+				failOpen = &v
+			}
 			result.Hooks = append(result.Hooks, config.HookConfig{
 				Name:        h.Name,
 				Event:       h.Event,
 				Type:        h.Type,
 				Exec:        execPath,
 				Timeout:     h.Timeout,
-				FailOpen:    h.FailOpen,
+				FailOpen:    failOpen,
 				ToolPattern: h.ToolPattern,
 			})
 		}
@@ -186,6 +186,26 @@ func translateRequest(req provider.Request) oai.ChatCompletionNewParams {
 		params.ReasoningEffort = effortToReasoningEffort(req.Thinking.Level)
 	}

+	// Honour ResponseFormat. ollama (via OpenAI-compatible endpoint) and
+	// llama.cpp both translate response_format=json_object to a decoding-
+	// time JSON constraint, which is the only reliable way to keep small
+	// models from emitting prose where structured output is required.
+	// Previously this field was silently dropped on the OpenAI path,
+	// which is why the SLM classifier saw a 100% prose-failure rate even
+	// after Move 1 wired ResponseFormat at the gnoma layer.
+	if req.ResponseFormat != nil {
+		switch req.ResponseFormat.Type {
+		case provider.ResponseJSON:
+			params.ResponseFormat = oai.ChatCompletionNewParamsResponseFormatUnion{
+				OfJSONObject: &shared.ResponseFormatJSONObjectParam{},
+			}
+		case provider.ResponseText:
+			params.ResponseFormat = oai.ChatCompletionNewParamsResponseFormatUnion{
+				OfText: &shared.ResponseFormatTextParam{},
+			}
+		}
+	}
+
 	if len(params.Tools) > 0 {
 		choice := "auto"
 		if req.ToolChoice != "" {
@@ -189,3 +189,47 @@ func TestTranslateRequest_ToolChoiceDefault(t *testing.T) {
 		})
 	}
 }
+
+func TestTranslateRequest_ResponseFormatJSON(t *testing.T) {
+	req := provider.Request{
+		Model: "qwen2.5-coder:1.5b",
+		Messages: []message.Message{
+			{Role: message.RoleUser, Content: []message.Content{{Type: message.ContentText, Text: "hi"}}},
+		},
+		ResponseFormat: &provider.ResponseFormat{Type: provider.ResponseJSON},
+	}
+	params := translateRequest(req)
+	if params.ResponseFormat.OfJSONObject == nil {
+		t.Errorf("expected OfJSONObject set when ResponseFormat=ResponseJSON, got %+v", params.ResponseFormat)
+	}
+	if params.ResponseFormat.OfText != nil {
+		t.Errorf("expected OfText nil when ResponseFormat=ResponseJSON")
+	}
+}
+
+func TestTranslateRequest_ResponseFormatText(t *testing.T) {
+	req := provider.Request{
+		Model: "qwen2.5-coder:1.5b",
+		Messages: []message.Message{
+			{Role: message.RoleUser, Content: []message.Content{{Type: message.ContentText, Text: "hi"}}},
+		},
+		ResponseFormat: &provider.ResponseFormat{Type: provider.ResponseText},
+	}
+	params := translateRequest(req)
+	if params.ResponseFormat.OfText == nil {
+		t.Errorf("expected OfText set when ResponseFormat=ResponseText, got %+v", params.ResponseFormat)
+	}
+}
+
+func TestTranslateRequest_ResponseFormatUnset(t *testing.T) {
+	req := provider.Request{
+		Model: "qwen2.5-coder:1.5b",
+		Messages: []message.Message{
+			{Role: message.RoleUser, Content: []message.Content{{Type: message.ContentText, Text: "hi"}}},
+		},
+	}
+	params := translateRequest(req)
+	if params.ResponseFormat.OfJSONObject != nil || params.ResponseFormat.OfText != nil {
+		t.Errorf("expected zero-valued ResponseFormat when not set, got %+v", params.ResponseFormat)
+	}
+}
@@ -93,16 +93,27 @@ func DiscoverOllama(ctx context.Context, baseURL string, probeCache map[string]O
 			Size:     m.Size,
 		}

+		// Always probe; the cache is optional. Previously nil-cache was
+		// treated as "skip probing entirely", which left SupportsTools
+		// at its zero value (false) for every model — every ollama-
+		// discovered arm then got marked as tool-unsupported and
+		// rejected by filterFeasible for any tool-requiring task. main.go
+		// passes nil from the synchronous discovery path; we still want
+		// real probe data there.
+		var result OllamaProbeResult
 		if probeCache != nil {
-			result, ok := probeCache[m.Name]
-			if !ok {
+			if cached, ok := probeCache[m.Name]; ok {
+				result = cached
+			} else {
 				result = probeOllamaModel(ctx, baseURL, m.Name)
 				probeCache[m.Name] = result
 			}
-			dm.SupportsTools = result.SupportsTools
-			dm.SupportsVision = result.SupportsVision
-			dm.ContextSize = result.ContextSize
+		} else {
+			result = probeOllamaModel(ctx, baseURL, m.Name)
 		}
+		dm.SupportsTools = result.SupportsTools
+		dm.SupportsVision = result.SupportsVision
+		dm.ContextSize = result.ContextSize

 		if dm.ContextSize == 0 {
 			dm.ContextSize = defaultOllamaContextSize
@@ -1,6 +1,7 @@
 package router

 import (
+	"log/slog"
 	"math"
 )

@@ -281,20 +282,39 @@ func effectiveCost(arm *Arm, task Task) float64 {
 // filterFeasible returns arms that can handle the task (tools, pool capacity, quality).
 // Arms that pass tool and pool checks but fall below the task's minimum quality threshold
 // are collected separately and used as a last resort if no arm meets the threshold.
+//
+// When the result is empty the caller surfaces a generic "no feasible arm"
+// error; rejection reasons are logged here at slog.Debug per-arm so users
+// debugging "why did the router reject everything?" with --verbose can see
+// the actual constraint each arm tripped instead of guessing.
 func filterFeasible(arms []*Arm, task Task) []*Arm {
 	threshold := DefaultThresholds[task.Type]

 	var feasible []*Arm
 	var belowQuality []*Arm // passed tool+pool but scored below minimum quality

+	reject := func(arm *Arm, reason string, fields ...any) {
+		base := []any{
+			"arm", arm.ID,
+			"task", task.Type,
+			"complexity", task.ComplexityScore,
+			"reason", reason,
+		}
+		slog.Debug("filterFeasible: rejected", append(base, fields...)...)
+	}
+
 	for _, arm := range arms {
 		// Complexity ceiling: zero means no ceiling (preserves behavior for all existing arms).
 		if arm.MaxComplexity > 0 && task.ComplexityScore > arm.MaxComplexity {
+			reject(arm, "complexity_exceeds_max",
+				"max_complexity", arm.MaxComplexity)
 			continue
 		}

 		// Must support tools if task requires them
 		if task.RequiresTools && !arm.SupportsTools() {
+			reject(arm, "tools_required_but_unsupported",
+				"tool_use_capability", arm.Capabilities.ToolUse)
 			continue
 		}

@@ -303,11 +323,15 @@ func filterFeasible(arms []*Arm, task Task) []*Arm {
 		// cannot consume the image bytes, so degrading to it would silently
 		// drop the image and confuse the model.
 		if task.RequiresVision && !arm.Capabilities.Vision {
+			reject(arm, "vision_required_but_unsupported",
+				"vision_capability", arm.Capabilities.Vision)
 			continue
 		}

 		// Must support the required effort level (EffortAuto always passes)
 		if !arm.Capabilities.SupportsEffort(task.RequiredEffort) {
+			reject(arm, "effort_level_unsupported",
+				"required_effort", task.RequiredEffort)
 			continue
 		}

@@ -316,6 +340,8 @@ func filterFeasible(arms []*Arm, task Task) []*Arm {
 		for _, pool := range arm.Pools {
 			pool.CheckReset()
 			if !pool.CanAfford(arm.ID, task.EstimatedTokens) {
+				reject(arm, "pool_capacity_exceeded",
+					"estimated_tokens", task.EstimatedTokens)
 				poolsOK = false
 				break
 			}
@@ -333,6 +359,16 @@ func filterFeasible(arms []*Arm, task Task) []*Arm {
 		feasible = append(feasible, arm)
 	}

+	if len(feasible) == 0 && len(belowQuality) == 0 {
+		slog.Debug("filterFeasible: no arms feasible at any quality level",
+			"task", task.Type,
+			"complexity", task.ComplexityScore,
+			"requires_tools", task.RequiresTools,
+			"requires_vision", task.RequiresVision,
+			"arms_considered", len(arms),
+		)
+	}
+
 	// Degrade gracefully: if no arm meets quality threshold, use below-quality ones
 	if len(feasible) == 0 && len(belowQuality) > 0 {
 		return belowQuality
@@ -14,10 +14,13 @@ import (
 	"somegit.dev/Owlibou/gnoma/internal/stream"
 )

-// defaultClassifyTimeout — 5 s accommodates thinking-mode models like
-// Qwen3 distillations (Tiny3.5) that emit reasoning tokens before output.
-// Non-thinking models complete in well under 1 s.
-const defaultClassifyTimeout = 5 * time.Second
+// defaultClassifyTimeout — 15 s accommodates cold-start model loads
+// (ollama lazily loads on first call, ~2-8s for a 1.5B model on SSD)
+// combined with thinking-mode first-token latency (Qwen3 distillations
+// like Tiny3.5 sometimes emit <think> tokens before the JSON output
+// even with /no_think). Non-thinking warm models complete in well
+// under 1 s. Tune via [slm].classify_timeout in config.
+const defaultClassifyTimeout = 15 * time.Second

 const classifySystemPrompt = `Classify the following coding request. /no_think
 Respond with JSON only, no other text, no reasoning, no thinking tags.
@@ -47,14 +50,18 @@ type Classifier struct {

 // NewClassifier creates a Classifier. model is the model name passed to the provider
 // (llamafile ignores it but openaicompat requires a non-empty value).
-func NewClassifier(p provider.Provider, model string, logger *slog.Logger) *Classifier {
+// Pass timeout=0 to use the built-in default (defaultClassifyTimeout).
+func NewClassifier(p provider.Provider, model string, timeout time.Duration, logger *slog.Logger) *Classifier {
 	if logger == nil {
 		logger = slog.Default()
 	}
+	if timeout <= 0 {
+		timeout = defaultClassifyTimeout
+	}
 	return &Classifier{
 		provider: p,
 		model:    model,
-		timeout:  defaultClassifyTimeout,
+		timeout:  timeout,
 		logger:   logger,
 	}
 }
@@ -68,7 +75,11 @@ func (c *Classifier) Classify(ctx context.Context, prompt string, history []mess

 	resp, err := c.callSLM(tctx, prompt)
 	if err != nil {
-		c.logger.Debug("slm classify fallback", "error", err)
+		// Warn-level so a first-time misconfiguration (timeout too tight,
+		// wrong endpoint, malformed JSON from the model) surfaces without
+		// requiring --verbose. The fallback path itself is benign; the
+		// signal is that the SLM isn't doing the work it was supposed to.
+		c.logger.Warn("slm classify fallback", "error", err, "timeout", c.timeout)
 		t, ferr := router.HeuristicClassifier{}.Classify(ctx, prompt, history)
 		t.ClassifierSource = router.ClassifierSLMFallback
 		return t, ferr
@@ -91,9 +102,25 @@ func (c *Classifier) Classify(ctx context.Context, prompt string, history []mess
 }

 func (c *Classifier) callSLM(ctx context.Context, prompt string) (*classifyResponse, error) {
+	// Constrain the model toward valid, deterministic JSON output. Without
+	// these settings small models routinely ignore the JSON-only system
+	// prompt, emit reasoning blocks (<think>, <Thought Process>) or just
+	// answer the user's prompt in prose. ResponseFormat=json_object asks
+	// the provider to enforce JSON at decoding time where supported
+	// (ollama 'format=json', llama.cpp grammar, OpenAI json_object). Even
+	// when the provider can't enforce, the explicit signal nudges the
+	// adapter to set the right backend flag.
+	temp := 0.0
+	topP := 1.0
 	req := provider.Request{
 		Model:        c.model,
 		SystemPrompt: classifySystemPrompt,
+		Temperature:  &temp,
+		TopP:         &topP,
+		MaxTokens:    128, // classification output is ~50 tokens; cap to prevent runaway reasoning
+		ResponseFormat: &provider.ResponseFormat{
+			Type: provider.ResponseJSON,
+		},
 		Messages: []message.Message{
 			{
 				Role:    message.RoleUser,
@@ -127,10 +154,22 @@ func (c *Classifier) callSLM(ctx context.Context, prompt string) (*classifyRespo
 	return &resp, nil
 }

-// extractJSON pulls the first {...} substring from s, stripping markdown fences if present.
+// extractJSON pulls the first {...} substring from s, stripping markdown
+// fences and known thinking-block tags. Small models routinely violate
+// the JSON-only system prompt by emitting reasoning tokens first, so
+// the extractor must tolerate prefixes the model wasn't asked to emit.
 func extractJSON(s string) string {
 	s = strings.TrimSpace(s)

+	// Strip known thinking-block tags. Order matters: longer/more-
+	// specific names first so a partial match doesn't shadow a real
+	// one. Seen in the wild on Qwen3 (<think>) and tiny3.5
+	// (<Thought Process>); the others are defensive against similar
+	// fine-tunes.
+	for _, tag := range []string{"Thought Process", "thinking", "reasoning", "thoughts", "think"} {
+		s = stripTagBlock(s, tag)
+	}
+
 	// Strip ```json ... ``` fences.
 	if strings.HasPrefix(s, "```") {
 		end := strings.LastIndex(s, "```")
@@ -160,3 +199,28 @@ func extractJSON(s string) string {
 	}
 	return s[start:]
 }
+
+// stripTagBlock removes <tag>...</tag> blocks (case-insensitive on the
+// tag name) from the start of s. Returns the original string if the tag
+// is not at the start. Idempotent; safe to call repeatedly.
+func stripTagBlock(s, tag string) string {
+	trimmed := strings.TrimSpace(s)
+	open := "<" + tag
+	lower := strings.ToLower(trimmed)
+	if !strings.HasPrefix(lower, strings.ToLower(open)) {
+		return s
+	}
+	// Find the matching closing tag, case-insensitive.
+	close := "</" + tag + ">"
+	closeIdx := strings.Index(strings.ToLower(trimmed), strings.ToLower(close))
+	if closeIdx < 0 {
+		// Unterminated thinking block — strip up to the first '{'
+		// so we still have a shot at extracting JSON that follows.
+		braceIdx := strings.IndexByte(trimmed, '{')
+		if braceIdx > 0 {
+			return strings.TrimSpace(trimmed[braceIdx:])
+		}
+		return s
+	}
+	return strings.TrimSpace(trimmed[closeIdx+len(close):])
+}
@@ -54,7 +54,7 @@ func TestClassifier_HappyPath(t *testing.T) {
 	// SLM complexity 0.55 stays above the Debug floor (0.4), so the SLM
 	// value is preserved verbatim.
 	p := &mockProvider{text: `{"task_type":"Debug","complexity":0.55,"requires_tools":false}`}
-	cls := NewClassifier(p, "default", nil)
+	cls := NewClassifier(p, "default", 0, nil)

 	task, err := cls.Classify(context.Background(), "fix the failing test", nil)
 	if err != nil {
@@ -76,7 +76,7 @@ func TestClassifier_AppliesTaskTypeFloor(t *testing.T) {
 	// bump ComplexityScore up to the floor so the SLM arm can't be picked
 	// for its own kind of misclassification.
 	p := &mockProvider{text: `{"task_type":"Debug","complexity":0.25,"requires_tools":false}`}
-	cls := NewClassifier(p, "default", nil)
+	cls := NewClassifier(p, "default", 0, nil)

 	task, err := cls.Classify(context.Background(), "fix the failing test", nil)
 	if err != nil {
@@ -91,7 +91,7 @@ func TestClassifier_AppliesTaskTypeFloor(t *testing.T) {
 func TestClassifier_BlendHeuristic(t *testing.T) {
 	// SLM returns one type; other Task fields should come from heuristic.
 	p := &mockProvider{text: `{"task_type":"Boilerplate","complexity":0.1,"requires_tools":false}`}
-	cls := NewClassifier(p, "default", nil)
+	cls := NewClassifier(p, "default", 0, nil)

 	task, err := cls.Classify(context.Background(), "scaffold a new HTTP handler", nil)
 	if err != nil {
@@ -108,7 +108,7 @@ func TestClassifier_BlendHeuristic(t *testing.T) {

 func TestClassifier_FallbackOnBadJSON(t *testing.T) {
 	p := &mockProvider{text: "I cannot classify that."}
-	cls := NewClassifier(p, "default", nil)
+	cls := NewClassifier(p, "default", 0, nil)

 	// Should not error — falls back to heuristic.
 	task, err := cls.Classify(context.Background(), "write unit tests for the parser", nil)
@@ -123,7 +123,7 @@ func TestClassifier_FallbackOnBadJSON(t *testing.T) {

 func TestClassifier_FallbackOnProviderError(t *testing.T) {
 	p := &mockProvider{err: errors.New("connection refused")}
-	cls := NewClassifier(p, "default", nil)
+	cls := NewClassifier(p, "default", 0, nil)

 	task, err := cls.Classify(context.Background(), "explain how generics work", nil)
 	if err != nil {
@@ -137,7 +137,7 @@ func TestClassifier_FallbackOnProviderError(t *testing.T) {

 func TestClassifier_FallbackOnTimeout(t *testing.T) {
 	p := &mockProvider{delay: 500 * time.Millisecond}
-	cls := NewClassifier(p, "default", nil)
+	cls := NewClassifier(p, "default", 0, nil)
 	cls.timeout = 50 * time.Millisecond // force timeout

 	task, err := cls.Classify(context.Background(), "debug the failing test", nil)
@@ -153,7 +153,7 @@ func TestClassifier_FallbackOnTimeout(t *testing.T) {
 func TestClassifier_FenceStripping(t *testing.T) {
 	fenced := "```json\n{\"task_type\":\"Refactor\",\"complexity\":0.5,\"requires_tools\":true}\n```"
 	p := &mockProvider{text: fenced}
-	cls := NewClassifier(p, "default", nil)
+	cls := NewClassifier(p, "default", 0, nil)

 	task, err := cls.Classify(context.Background(), "refactor the auth middleware", nil)
 	if err != nil {
@@ -166,7 +166,7 @@ func TestClassifier_FenceStripping(t *testing.T) {

 func TestClassifier_UnknownTaskType_FallsBackToHeuristic(t *testing.T) {
 	p := &mockProvider{text: `{"task_type":"FooBar","complexity":0.3,"requires_tools":false}`}
-	cls := NewClassifier(p, "default", nil)
+	cls := NewClassifier(p, "default", 0, nil)

 	task, err := cls.Classify(context.Background(), "implement a binary search function", nil)
 	if err != nil {
@@ -178,7 +178,7 @@ func TestClassifier_UnknownTaskType_FallsBackToHeuristic(t *testing.T) {

 func TestClassifier_SetsClassifierSource_OnSuccess(t *testing.T) {
 	p := &mockProvider{text: `{"task_type":"Debug","complexity":0.3,"requires_tools":true}`}
-	cls := NewClassifier(p, "default", nil)
+	cls := NewClassifier(p, "default", 0, nil)
 	task, err := cls.Classify(context.Background(), "fix the failing test", nil)
 	if err != nil {
 		t.Fatal(err)
@@ -190,7 +190,7 @@ func TestClassifier_SetsClassifierSource_OnSuccess(t *testing.T) {

 func TestClassifier_SetsClassifierSource_OnFallback(t *testing.T) {
 	p := &mockProvider{err: errors.New("backend unreachable")}
-	cls := NewClassifier(p, "default", nil)
+	cls := NewClassifier(p, "default", 0, nil)
 	task, err := cls.Classify(context.Background(), "fix the failing test", nil)
 	if err != nil {
 		t.Fatal(err)
@@ -202,7 +202,7 @@ func TestClassifier_SetsClassifierSource_OnFallback(t *testing.T) {

 func TestClassifier_ContextPassedToHistory(t *testing.T) {
 	p := &mockProvider{text: `{"task_type":"Explain","complexity":0.2,"requires_tools":false}`}
-	cls := NewClassifier(p, "default", nil)
+	cls := NewClassifier(p, "default", 0, nil)

 	history := []message.Message{
 		{Role: message.RoleUser, Content: []message.Content{{Type: message.ContentText, Text: "prior"}}},
@@ -215,3 +215,45 @@ func TestClassifier_ContextPassedToHistory(t *testing.T) {
 		t.Errorf("Type = %s, want Explain", task.Type)
 	}
 }
+
+func TestExtractJSON_StripsThinkingTags(t *testing.T) {
+	cases := []struct {
+		name string
+		in   string
+		want string
+	}{
+		{
+			name: "qwen-think-block",
+			in:   `<think>Let me decide</think>{"task_type":"Debug","complexity":0.5,"requires_tools":true}`,
+			want: `{"task_type":"Debug","complexity":0.5,"requires_tools":true}`,
+		},
+		{
+			name: "tiny3.5-thought-process",
+			in:   "<Thought Process>\nUser wants debugging help.\n</Thought Process>\n{\"task_type\":\"Debug\",\"complexity\":0.4,\"requires_tools\":true}",
+			want: `{"task_type":"Debug","complexity":0.4,"requires_tools":true}`,
+		},
+		{
+			name: "unterminated-think-falls-back-to-brace",
+			in:   `<think>incomplete reasoning {"task_type":"Explain","complexity":0.2,"requires_tools":false}`,
+			want: `{"task_type":"Explain","complexity":0.2,"requires_tools":false}`,
+		},
+		{
+			name: "no-tags-still-works",
+			in:   `{"task_type":"Generation","complexity":0.6,"requires_tools":false}`,
+			want: `{"task_type":"Generation","complexity":0.6,"requires_tools":false}`,
+		},
+		{
+			name: "fenced-json-still-works",
+			in:   "```json\n{\"task_type\":\"Refactor\",\"complexity\":0.5,\"requires_tools\":true}\n```",
+			want: `{"task_type":"Refactor","complexity":0.5,"requires_tools":true}`,
+		},
+	}
+	for _, tc := range cases {
+		t.Run(tc.name, func(t *testing.T) {
+			got := extractJSON(tc.in)
+			if got != tc.want {
+				t.Errorf("extractJSON(...)\n  got:  %q\n  want: %q", got, tc.want)
+			}
+		})
+	}
+}
@@ -1146,6 +1146,15 @@ func (m Model) submitInput(input string) (tea.Model, tea.Cmd) {
 	m.thinkingBuf.Reset()
 	m.streamFilterClose = ""

+	// Recover from a prior StateError before submitting a fresh user
+	// prompt. A transient routing or engine failure used to leave the
+	// session in error state, blocking every subsequent prompt with
+	// "session not idle (state: error)" until the user restarted gnoma.
+	// User-initiated sends always carry an intent-to-retry, so resetting
+	// here is the safe default; the /init retry path has its own explicit
+	// ResetError that we leave alone.
+	m.session.ResetError()
+
 	if err := m.session.Send(expandedInput); err != nil {
 		m.messages = append(m.messages, chatMessage{role: "error", content: formatError(err)})
 		m.streaming = false
@@ -1494,6 +1503,8 @@ func (m Model) handleCommand(cmd string) (tea.Model, tea.Cmd) {
 		m.initWriteNudged = false

 		opts := engine.TurnOptions{}
+		// Recover from prior StateError before /init can submit.
+		m.session.ResetError()
 		if err := m.session.SendWithOptions(prompt, opts); err != nil {
 			m.messages = append(m.messages, chatMessage{role: "error", content: formatError(err)})
 			m.streaming = false
@@ -1695,6 +1706,8 @@ func (m Model) handleCommand(cmd string) (tea.Model, tea.Cmd) {
 					AllowedTools: sk.Frontmatter.AllowedTools,
 					AllowedPaths: sk.Frontmatter.Paths,
 				}
+				// Recover from prior StateError before the skill submits.
+				m.session.ResetError()
 				if err := m.session.SendWithOptions(rendered, skillOpts); err != nil {
 					m.messages = append(m.messages, chatMessage{role: "error", content: formatError(err)})
 					m.streaming = false