# Gnoma — TODO

Active work, newest first.

## In flight

- **Bandit selector — design decisions deferred.** The current
  selector (`internal/router/selector.go:scoreArm`) is greedy
  quality-weighted: per-(arm × task-type) EMA scores blended 70/30
  with heuristic defaults, divided by CostWeight-adjusted cost. It
  is **not** a true multi-armed bandit — no UCB-style exploration
  bonus, no Thompson sampling. Tracked as a design question rather
  than a must-implement item because of two open dependencies:

  1. **Whether to keep numeric EMA at all.** The 2026-05-07 roadmap
     (Phase 4) puts re-evaluating bandit learning on hold until the
     SLM-driven dispatcher is in production. Three options on the
     table: keep bandit as feedback for the SLM, retire EMA in
     favour of qualitative outcome summaries fed to the SLM, or
     split responsibilities (SLM = intent routing, bandit =
     cost/quality within a tier). See
     [`docs/superpowers/plans/2026-05-07-gnoma-roadmap.md`](docs/superpowers/plans/2026-05-07-gnoma-roadmap.md)
     §Phase 4.

  2. **User-tunable selector knobs.** Several constants are
     hardcoded today: `qualityAlpha` (EMA smoothing, ~3-sample
     memory), the 70/30 observed/heuristic blend,
     `strengthScoreBonus` for tagged task types, and the
     `DefaultThresholds.Minimum` quality floor. Surfacing these as
     `[router.bandit]` config keys would let users tune for their
     workloads (faster alpha for shifting model performance, longer
     memory for stable fleets) without waiting for the strategic
     decision in #1.

  Surfaced from the r/coolgithubprojects v0.3.1 launch thread
  (2026-05-24, `u/Ha_Deal_5079`).

- **Security boundary — egress controls + session audit log.** The
  current `Firewall` is a content boundary only (scans messages and
  tool results for secrets via regex + Shannon entropy, redacts or
  blocks, logs via `log/slog`). It does not enforce network egress —
  outgoing HTTP from tools and providers uses stock `http.Client`
  with no per-host allowlist or dial-layer interception. Two follow-
  ups surfaced from the r/SideProject v0.3.0 launch thread
  (2026-05-24, `u/Secret_Theme3192`):
  1. **Per-session audit log of blocked/redacted events** —
     grep-able file at `.gnoma/sessions/<id>/audit.jsonl` so the
     user can answer "what did the firewall do this session?" in
     one command. Today the `slog` output goes to whatever sink is
     configured, with no per-session grouping.
  2. **Per-host egress allowlist (HTTP transport layer)** — open
     design question: host-level (`allow api.openai.com, deny *`)
     vs per-tool (`bash can only hit these hosts`). Reply asked
     the commenter for their mental model; revisit when feedback
     lands. The README and v0.3.0 Reddit post phrasing oversold
     "network egress gated"; corrected in the same commit as this
     TODO entry.

- **Tool-router specialization (functiongemma)** — gated on telemetry,
  not committed. Phase A.2 adds did-switch-rate measurement to the
  two-stage `select_category` path; Phase A.3 (LoRA fine-tune of
  `functiongemma-270m-it` as a dedicated `ArmRoleToolRouter`) only
  fires if did-switch rate exceeds 20 %. Three independent external
  reviews consulted 2026-05-23; consensus is "fits as tool-call
  router, not chat; fine-tuning mandatory; prove the need first."
  See
  [`docs/superpowers/plans/2026-05-23-tool-router-specialization.md`](docs/superpowers/plans/2026-05-23-tool-router-specialization.md).
- **Entropy FP reduction (post-SLM Phase F)** — F-1 (format-aware
  pre-extractor) shipped 2026-05-22: `[security].entropy_safelist`
  with `uuid`, `sha_hex`, `iso8601`, `url`; default empty so
  pre-F-1 behaviour is unchanged. F-2 (SLM-assisted classifier for
  ambiguous entropy hits) remains gated on F-1 FP-rate telemetry
  from real workloads plus ≥50 SLM observations. Surfaced from the
  r/ollama launch thread (2026-05-20); external validation from
  alterlab.io on the same tiered approach. See
  [`docs/superpowers/plans/2026-05-19-post-slm-unlock.md`](docs/superpowers/plans/2026-05-19-post-slm-unlock.md).
- **Compound tools (post-SLM Phase E)** — held until ≥50 SLM
  observations inform which primitives are worth adding. See
  [`docs/superpowers/plans/2026-05-19-post-slm-unlock.md`](docs/superpowers/plans/2026-05-19-post-slm-unlock.md).
- **Sensitive-content handling — unified policy.** Three input paths
  can introduce sensitive content into the context: pasted images
  (screenshots may contain secrets, API keys, PII), pasted text (often
  copied straight from a terminal with credentials), and tool-read
  files (`.env`, key files, etc.). Today these are handled
  inconsistently: incognito gates persistence but content still flows
  to providers; outgoing-scan firewall covers some patterns but is
  format-aware only for text. Need a single policy/UI: at-paste
  warning when the content matches sensitive heuristics, a
  consent-gated review step, and consistent treatment across the
  three paths. Cross-cuts with Phase F entropy work and the
  outgoing-scan firewall.
- **Distribution — follow-ups.** v0.1.0 shipped (archives on
  github.com/VikingOwl91/gnoma/releases, multi-arch images on
  ghcr.io/vikingowl91/gnoma). Still optional: Homebrew tap,
  `curl | sh` installer script, signed checksums (cosign/sigstore),
  release note automation, Windows process-tree kill via
  golang.org/x/sys/windows job objects (currently `os.Process.Kill`
  only — see `internal/mcp/transport_windows.go`), and migration
  from `dockers` + `docker_manifests` to `dockers_v2` in
  `.goreleaser.yml` (collapses ~45 lines into one block but
  requires Dockerfile changes for the per-platform binary layout
  — deferred to its own commit before v0.3.0).

## Stable backlog (not in active phases)

- **Thinking mode** (disabled / budget / adaptive) — M12.
- **Structured output** with JSON schema validation — M12.
- **Native agy JSON output** — switch the subprocess provider to
  `--output-format stream-json` once the agy CLI supports it,
  replacing the current prompt-augmentation fallback.
- **SQLite session persistence** + serve mode — M10.
- **Task learning** (pattern recognition, persistent tasks) — M11.
- **Web UI** (`gnoma web`) — M15.
- **OAuth / keyring** — M13.
- **Observability** (feature flags, cost dashboards) — M14.
- **PE / Mach-O ELF support** — future, after ELF Phase 6.

## History

Completed initiatives, kept here as pointers to their plan files:

- **v0.1.0 release** — 2026-05-20. First tagged release. GoReleaser
  pipeline produces six static archives (linux/darwin/windows ×
  amd64/arm64) on the GitHub mirror plus multi-arch Docker images on
  GHCR. History was rewritten on the same day to migrate authorship to
  a noreply identity and strip co-author attribution.

- **Post-audit security hardening** — complete 2026-05-19. Three waves
  + one ADR closed all 14 findings from the external review:
  - [Wave 1 — SafeProvider boundary](docs/superpowers/plans/2026-05-19-security-wave1-safeprovider.md)
  - [Wave 2 — Incognito coherence](docs/superpowers/plans/2026-05-19-security-wave2-incognito.md)
  - Wave 3 — scanner + path hygiene (rolled out directly without a
    plan file; see commits leading up to 2026-05-19 on `internal/security`)
  - [ADR-004 — PostToolUse hook ordering](docs/essentials/decisions/004-posttooluse-hook-ordering.md)
- **Post-SLM unlock** —
  [plan](docs/superpowers/plans/2026-05-19-post-slm-unlock.md). Phases
  A–D complete (two-stage tool routing, CLI agent binary override,
  user profiles, per-arm capability tags).
- **2026-05-07 roadmap** —
  [plan](docs/superpowers/plans/2026-05-07-gnoma-roadmap.md). M1–M8
  done; SLM classifier (Phase 3) complete; Phase 4 superseded by the
  post-SLM plan.

## Reference

- Milestones: `docs/essentials/milestones.md`
- Decisions: `docs/essentials/decisions/`
- ADR-002 (SLM routing, supersedes earlier ADR-009): `docs/essentials/decisions/002-slm-routing.md`