gnoma/TODO.md

# Gnoma — TODO

Active work, newest first.

## In flight

- **Config write/merge — silent corruption of layered configs.**
  `internal/config/write.go:setConfig` reads the existing TOML into a
  zero-valued `Config` struct, sets one field, and writes the entire
  struct back out — so every untouched field gets serialized at its
  Go zero value (empty strings, zero ints, `false` bools). On the
  next load, those explicit zeros overwrite higher-priority layers
  via `toml.Decode`'s "present field beats absent field" semantics.

  Concrete symptom (2026-05-24): user's `~/.config/gnoma/config.toml`
  had `[router].prefer = "cloud"` but the project-level
  `.gnoma/config.toml` had `prefer = ""` (generated by an earlier
  `gnoma config set ...` call), which silently downgraded the
  effective policy to `auto` — visible only via the new `/router`
  TUI command, with no warning.

  Same root cause is responsible for the zero-spammed global config
  the same user has (`max_tokens = 0`, `permission.mode = ""`,
  `bash_timeout = 0`, etc.) — all overwriting sensible defaults.

  **Fix surface (multi-part, plan-worthy):**

  1. **Stop generating zero-spam.** Two options:
     - Tag struct fields with `,omitempty` so the BurntSushi encoder
       skips zero values. Caveat: conflates "unset" with "explicitly
       zero" for primitive types (a user who wants `max_keep = 0`
       loses it). Safe for strings/maps/slices where empty is never
       user-intent; lossy for numeric fields.
     - Switch to `pelletier/go-toml/v2` and use its document model
       to edit only the targeted key, preserving everything else
       byte-for-byte. Cleaner semantics, bigger refactor.
     - Hybrid: omitempty on string/map/slice fields, document-level
       edit for numerics. Fastest path that doesn't lose intent.

  2. **`gnoma doctor` — read-only diagnostic.** Scans both global
     and project configs and reports:
     - Zero-spam fields that would silently shadow defaults or
       upstream layers.
     - Invalid enum values (e.g. `permission.mode = ""`).
     - Unknown / removed keys from older schema versions.
     - Effective-merged values (so the user sees what gnoma will
       actually use after layering). No writes. Exits non-zero on
       findings so it's CI-friendly.

  3. **`gnoma upgrade-config` — active migration.** For each config
     file (global, profiles, project):
     - Compute the cleaned form (only fields the user actually set,
       dropping zeros that match defaults).
     - Write the original to `<path>.bak` with timestamp suffix.
     - Write the cleaned form to the original path.
     - Print a diff of what changed so the user can verify.

  4. **Project-level auto-migration on startup.** If gnoma detects
     a zero-spammed project `.gnoma/config.toml` at launch:
     - Auto-run the upgrade (project-only, never auto-touch the
       global config).
     - Write `.gnoma/config.toml.bak-YYYY-MM-DD-HHMMSS`.
     - Surface a one-line notice in the startup safety banner:
       `config: migrated .gnoma/config.toml (see .bak)`.
     - The auto-migration is non-destructive (`.bak` preserves
       original) but still gated behind a `[config].auto_migrate`
       toggle, defaulting to `true`. Global configs require
       explicit `gnoma upgrade-config`.

  5. **Project registry** (`~/.config/gnoma/projects.json`). Today
     there is no record of which directories gnoma has been launched
     in — items #2 and #3 can work with a filesystem scan
     (`find ~ -type d -name .gnoma`), but a registry makes them
     significantly faster and unlocks cross-project features.
     Sketch:

     ```json
     {
       "projects": [
         {
           "path": "/home/.../my-repo",
           "first_seen": "2026-04-15T10:30:00Z",
           "last_seen":  "2026-05-24T19:23:00Z",
           "session_count": 47
         }
       ]
     }
     ```

     Update on every successful startup (record project root,
     bump `last_seen` + increment `session_count`). Enables:
     - Fast `gnoma doctor --all-projects` without a filesystem walk.
     - Cross-project session listing (`gnoma sessions --all`
       picker; surface most-recent sessions across the registry).
     - `gnoma upgrade-config` that can migrate every known project
       in one invocation.
     - Future local-only aggregate stats (`gnoma stats`) — still
       no-phone-home, just a sum across the registry.

     **Caveats and design constraints:**
     - The registry file becomes another silent-corruption surface
       — must use the same `omitempty` / atomic-write discipline
       as the encoder fix in #1, or it'll exhibit the same class
       of bug.
     - Stale entries (deleted projects). `gnoma doctor` should
       detect and offer to prune; do not auto-delete.
     - Privacy: this is literally a log of directories the user
       has worked in. Local-only, never sent off-machine (per the
       no-phone-home positioning), but worth a one-line note in
       the Security section of the README so users know it exists.
     - Opt-out: `[config].project_registry = false` for users who
       don't want this tracked. Default `true`.
     - Atomic writes (temp file + rename) so a crash mid-write
       doesn't corrupt the file.

  Surfaced from the v0.3.1 launch wave (2026-05-24).
  Plan:
  [`docs/superpowers/plans/2026-05-24-config-migration.md`](docs/superpowers/plans/2026-05-24-config-migration.md).

- **Bandit selector — design decisions deferred.** The current
  selector (`internal/router/selector.go:scoreArm`) is greedy
  quality-weighted: per-(arm × task-type) EMA scores blended 70/30
  with heuristic defaults, divided by CostWeight-adjusted cost. It
  is **not** a true multi-armed bandit — no UCB-style exploration
  bonus, no Thompson sampling. Tracked as a design question rather
  than a must-implement item because of two open dependencies:

  1. **Whether to keep numeric EMA at all.** The 2026-05-07 roadmap
     (Phase 4) puts re-evaluating bandit learning on hold until the
     SLM-driven dispatcher is in production. Three options on the
     table: keep bandit as feedback for the SLM, retire EMA in
     favour of qualitative outcome summaries fed to the SLM, or
     split responsibilities (SLM = intent routing, bandit =
     cost/quality within a tier). See
     [`docs/superpowers/plans/2026-05-07-gnoma-roadmap.md`](docs/superpowers/plans/2026-05-07-gnoma-roadmap.md)
     §Phase 4.

  2. **User-tunable selector knobs.** Several constants are
     hardcoded today: `qualityAlpha` (EMA smoothing, ~3-sample
     memory), the 70/30 observed/heuristic blend,
     `strengthScoreBonus` for tagged task types, and the
     `DefaultThresholds.Minimum` quality floor. Surfacing these as
     `[router.bandit]` config keys would let users tune for their
     workloads (faster alpha for shifting model performance, longer
     memory for stable fleets) without waiting for the strategic
     decision in #1.

  Surfaced from the r/coolgithubprojects v0.3.1 launch thread
  (2026-05-24, `u/Ha_Deal_5079`). The encoder + contextual bandit
  alternative is now sketched in
  [`docs/superpowers/plans/2026-05-25-encoder-bandit-router.md`](docs/superpowers/plans/2026-05-25-encoder-bandit-router.md) —
  that plan supersedes #1 above when it ships.

- **Security boundary — egress controls + session audit log.** The
  current `Firewall` is a content boundary only (scans messages and
  tool results for secrets via regex + Shannon entropy, redacts or
  blocks, logs via `log/slog`). It does not enforce network egress —
  outgoing HTTP from tools and providers uses stock `http.Client`
  with no per-host allowlist or dial-layer interception. Two follow-
  ups surfaced from the r/SideProject v0.3.0 launch thread
  (2026-05-24, `u/Secret_Theme3192`):
  1. **Per-session audit log of blocked/redacted events** —
     grep-able file at `.gnoma/sessions/<id>/audit.jsonl` so the
     user can answer "what did the firewall do this session?" in
     one command. Today the `slog` output goes to whatever sink is
     configured, with no per-session grouping.
  2. **Per-host egress allowlist (HTTP transport layer)** — open
     design question: host-level (`allow api.openai.com, deny *`)
     vs per-tool (`bash can only hit these hosts`). Reply asked
     the commenter for their mental model; revisit when feedback
     lands. The README and v0.3.0 Reddit post phrasing oversold
     "network egress gated"; corrected in the same commit as this
     TODO entry.

- **Tool-router specialization (functiongemma)** — gated on telemetry,
  not committed. Phase A.2 adds did-switch-rate measurement to the
  two-stage `select_category` path; Phase A.3 (LoRA fine-tune of
  `functiongemma-270m-it` as a dedicated `ArmRoleToolRouter`) only
  fires if did-switch rate exceeds 20 %. Three independent external
  reviews consulted 2026-05-23; consensus is "fits as tool-call
  router, not chat; fine-tuning mandatory; prove the need first."
  See
  [`docs/superpowers/plans/2026-05-23-tool-router-specialization.md`](docs/superpowers/plans/2026-05-23-tool-router-specialization.md).
- **Entropy FP reduction (post-SLM Phase F)** — F-1 (format-aware
  pre-extractor) shipped 2026-05-22: `[security].entropy_safelist`
  with `uuid`, `sha_hex`, `iso8601`, `url`; default empty so
  pre-F-1 behaviour is unchanged. F-2 (SLM-assisted classifier for
  ambiguous entropy hits) remains gated on F-1 FP-rate telemetry
  from real workloads plus ≥50 SLM observations. Surfaced from the
  r/ollama launch thread (2026-05-20); external validation from
  alterlab.io on the same tiered approach. See
  [`docs/superpowers/plans/2026-05-19-post-slm-unlock.md`](docs/superpowers/plans/2026-05-19-post-slm-unlock.md).
- **Compound tools (post-SLM Phase E)** — held until ≥50 SLM
  observations inform which primitives are worth adding. See
  [`docs/superpowers/plans/2026-05-19-post-slm-unlock.md`](docs/superpowers/plans/2026-05-19-post-slm-unlock.md).
- **Sensitive-content handling — unified policy.** Three input paths
  can introduce sensitive content into the context: pasted images
  (screenshots may contain secrets, API keys, PII), pasted text (often
  copied straight from a terminal with credentials), and tool-read
  files (`.env`, key files, etc.). Today these are handled
  inconsistently: incognito gates persistence but content still flows
  to providers; outgoing-scan firewall covers some patterns but is
  format-aware only for text. Need a single policy/UI: at-paste
  warning when the content matches sensitive heuristics, a
  consent-gated review step, and consistent treatment across the
  three paths. Cross-cuts with Phase F entropy work and the
  outgoing-scan firewall. Plan:
  [`docs/superpowers/plans/2026-05-24-sensitive-content-policy.md`](docs/superpowers/plans/2026-05-24-sensitive-content-policy.md).
- **Distribution — follow-ups.** v0.1.0 shipped (archives on
  github.com/VikingOwl91/gnoma/releases, multi-arch images on
  ghcr.io/vikingowl91/gnoma). Still optional: Homebrew tap,
  `curl | sh` installer script, signed checksums (cosign/sigstore),
  release note automation, Windows process-tree kill via
  golang.org/x/sys/windows job objects (currently `os.Process.Kill`
  only — see `internal/mcp/transport_windows.go`), and migration
  from `dockers` + `docker_manifests` to `dockers_v2` in
  `.goreleaser.yml` (collapses ~45 lines into one block but
  requires Dockerfile changes for the per-platform binary layout
  — deferred to its own commit before v0.3.0).

## Stable backlog (not in active phases)

- **Thinking mode** (disabled / budget / adaptive) — M12.
- **Structured output** with JSON schema validation — M12.
- **Native agy JSON output** — switch the subprocess provider to
  `--output-format stream-json` once the agy CLI supports it,
  replacing the current prompt-augmentation fallback. Until then,
  agy's `ToolUse` capability is set to `false` (see
  `internal/provider/subprocess/agent.go` agy entry) — without
  structured tool-call output, the router would otherwise dispatch
  tool-needing tasks to agy and the turn would hang on prose
  hallucinations of tool calls. Flip the capability back to `true`
  in the same change that lands stream-json parsing.
- **SQLite session persistence** + serve mode — M10.
- **Task learning** (pattern recognition, persistent tasks) — M11.
- **Web UI** (`gnoma web`) — M15.
- **OAuth / keyring** — M13.
- **Observability** (feature flags, cost dashboards) — M14.
- **PE / Mach-O ELF support** — future, after ELF Phase 6.

## History

Completed initiatives, kept here as pointers to their plan files:

- **v0.1.0 release** — 2026-05-20. First tagged release. GoReleaser
  pipeline produces six static archives (linux/darwin/windows ×
  amd64/arm64) on the GitHub mirror plus multi-arch Docker images on
  GHCR. History was rewritten on the same day to migrate authorship to
  a noreply identity and strip co-author attribution.

- **Post-audit security hardening** — complete 2026-05-19. Three waves
  + one ADR closed all 14 findings from the external review:
  - [Wave 1 — SafeProvider boundary](docs/superpowers/plans/2026-05-19-security-wave1-safeprovider.md)
  - [Wave 2 — Incognito coherence](docs/superpowers/plans/2026-05-19-security-wave2-incognito.md)
  - Wave 3 — scanner + path hygiene (rolled out directly without a
    plan file; see commits leading up to 2026-05-19 on `internal/security`)
  - [ADR-004 — PostToolUse hook ordering](docs/essentials/decisions/004-posttooluse-hook-ordering.md)
- **Post-SLM unlock** —
  [plan](docs/superpowers/plans/2026-05-19-post-slm-unlock.md). Phases
  A–D complete (two-stage tool routing, CLI agent binary override,
  user profiles, per-arm capability tags).
- **2026-05-07 roadmap** —
  [plan](docs/superpowers/plans/2026-05-07-gnoma-roadmap.md). M1–M8
  done; SLM classifier (Phase 3) complete; Phase 4 superseded by the
  post-SLM plan.

## Reference

- Milestones: `docs/essentials/milestones.md`
- Decisions: `docs/essentials/decisions/`
- ADR-002 (SLM routing, supersedes earlier ADR-009): `docs/essentials/decisions/002-slm-routing.md`