24945b1eb2
Captures the architectural research surfaced during the 2026-05-25 SLM-failure diagnostic session: RouteLLM treats routing as classification, ModernBERT is well-suited to that classification, and FunctionGemma fits as an optional JSON-sanity layer rather than the primary classifier. The current decoder-SLM-as-classifier design is the wrong shape (100% failure rate observed across two model swaps). Five-phase plan: 1. Embedding feature scaffold (near-term, additive, opt-in) 2. Contextual bandit (LinUCB / Thompson) over the feature set 3. Retire the decoder-SLM classifier once 2 outperforms 4. ModernBERT fine-tune on the accumulated labelled data 5. FunctionGemma JSON sanity layer (optional final stage) Phase 1 is the only piece scoped for near-term implementation; the rest is multi-month and hinges on the strategic 'EMA vs SLM' question already tracked in TODO. Cross-references the existing tool-router-specialization plan so a reader of either lands on both. Updates the TODO entry for the bandit selector to note the supersession path.
272 lines
14 KiB
Markdown
272 lines
14 KiB
Markdown
# Gnoma — TODO
|
||
|
||
Active work, newest first.
|
||
|
||
## In flight
|
||
|
||
- **Config write/merge — silent corruption of layered configs.**
|
||
`internal/config/write.go:setConfig` reads the existing TOML into a
|
||
zero-valued `Config` struct, sets one field, and writes the entire
|
||
struct back out — so every untouched field gets serialized at its
|
||
Go zero value (empty strings, zero ints, `false` bools). On the
|
||
next load, those explicit zeros overwrite higher-priority layers
|
||
via `toml.Decode`'s "present field beats absent field" semantics.
|
||
|
||
Concrete symptom (2026-05-24): user's `~/.config/gnoma/config.toml`
|
||
had `[router].prefer = "cloud"` but the project-level
|
||
`.gnoma/config.toml` had `prefer = ""` (generated by an earlier
|
||
`gnoma config set ...` call), which silently downgraded the
|
||
effective policy to `auto` — visible only via the new `/router`
|
||
TUI command, with no warning.
|
||
|
||
Same root cause is responsible for the zero-spammed global config
|
||
the same user has (`max_tokens = 0`, `permission.mode = ""`,
|
||
`bash_timeout = 0`, etc.) — all overwriting sensible defaults.
|
||
|
||
**Fix surface (multi-part, plan-worthy):**
|
||
|
||
1. **Stop generating zero-spam.** Two options:
|
||
- Tag struct fields with `,omitempty` so the BurntSushi encoder
|
||
skips zero values. Caveat: conflates "unset" with "explicitly
|
||
zero" for primitive types (a user who wants `max_keep = 0`
|
||
loses it). Safe for strings/maps/slices where empty is never
|
||
user-intent; lossy for numeric fields.
|
||
- Switch to `pelletier/go-toml/v2` and use its document model
|
||
to edit only the targeted key, preserving everything else
|
||
byte-for-byte. Cleaner semantics, bigger refactor.
|
||
- Hybrid: omitempty on string/map/slice fields, document-level
|
||
edit for numerics. Fastest path that doesn't lose intent.
|
||
|
||
2. **`gnoma doctor` — read-only diagnostic.** Scans both global
|
||
and project configs and reports:
|
||
- Zero-spam fields that would silently shadow defaults or
|
||
upstream layers.
|
||
- Invalid enum values (e.g. `permission.mode = ""`).
|
||
- Unknown / removed keys from older schema versions.
|
||
- Effective-merged values (so the user sees what gnoma will
|
||
actually use after layering). No writes. Exits non-zero on
|
||
findings so it's CI-friendly.
|
||
|
||
3. **`gnoma upgrade-config` — active migration.** For each config
|
||
file (global, profiles, project):
|
||
- Compute the cleaned form (only fields the user actually set,
|
||
dropping zeros that match defaults).
|
||
- Write the original to `<path>.bak` with timestamp suffix.
|
||
- Write the cleaned form to the original path.
|
||
- Print a diff of what changed so the user can verify.
|
||
|
||
4. **Project-level auto-migration on startup.** If gnoma detects
|
||
a zero-spammed project `.gnoma/config.toml` at launch:
|
||
- Auto-run the upgrade (project-only, never auto-touch the
|
||
global config).
|
||
- Write `.gnoma/config.toml.bak-YYYY-MM-DD-HHMMSS`.
|
||
- Surface a one-line notice in the startup safety banner:
|
||
`config: migrated .gnoma/config.toml (see .bak)`.
|
||
- The auto-migration is non-destructive (`.bak` preserves
|
||
original) but still gated behind a `[config].auto_migrate`
|
||
toggle, defaulting to `true`. Global configs require
|
||
explicit `gnoma upgrade-config`.
|
||
|
||
5. **Project registry** (`~/.config/gnoma/projects.json`). Today
|
||
there is no record of which directories gnoma has been launched
|
||
in — items #2 and #3 can work with a filesystem scan
|
||
(`find ~ -type d -name .gnoma`), but a registry makes them
|
||
significantly faster and unlocks cross-project features.
|
||
Sketch:
|
||
|
||
```json
|
||
{
|
||
"projects": [
|
||
{
|
||
"path": "/home/.../my-repo",
|
||
"first_seen": "2026-04-15T10:30:00Z",
|
||
"last_seen": "2026-05-24T19:23:00Z",
|
||
"session_count": 47
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
Update on every successful startup (record project root,
|
||
bump `last_seen` + increment `session_count`). Enables:
|
||
- Fast `gnoma doctor --all-projects` without a filesystem walk.
|
||
- Cross-project session listing (`gnoma sessions --all`
|
||
picker; surface most-recent sessions across the registry).
|
||
- `gnoma upgrade-config` that can migrate every known project
|
||
in one invocation.
|
||
- Future local-only aggregate stats (`gnoma stats`) — still
|
||
no-phone-home, just a sum across the registry.
|
||
|
||
**Caveats and design constraints:**
|
||
- The registry file becomes another silent-corruption surface
|
||
— must use the same `omitempty` / atomic-write discipline
|
||
as the encoder fix in #1, or it'll exhibit the same class
|
||
of bug.
|
||
- Stale entries (deleted projects). `gnoma doctor` should
|
||
detect and offer to prune; do not auto-delete.
|
||
- Privacy: this is literally a log of directories the user
|
||
has worked in. Local-only, never sent off-machine (per the
|
||
no-phone-home positioning), but worth a one-line note in
|
||
the Security section of the README so users know it exists.
|
||
- Opt-out: `[config].project_registry = false` for users who
|
||
don't want this tracked. Default `true`.
|
||
- Atomic writes (temp file + rename) so a crash mid-write
|
||
doesn't corrupt the file.
|
||
|
||
Surfaced from the v0.3.1 launch wave (2026-05-24).
|
||
Plan:
|
||
[`docs/superpowers/plans/2026-05-24-config-migration.md`](docs/superpowers/plans/2026-05-24-config-migration.md).
|
||
|
||
- **Bandit selector — design decisions deferred.** The current
|
||
selector (`internal/router/selector.go:scoreArm`) is greedy
|
||
quality-weighted: per-(arm × task-type) EMA scores blended 70/30
|
||
with heuristic defaults, divided by CostWeight-adjusted cost. It
|
||
is **not** a true multi-armed bandit — no UCB-style exploration
|
||
bonus, no Thompson sampling. Tracked as a design question rather
|
||
than a must-implement item because of two open dependencies:
|
||
|
||
1. **Whether to keep numeric EMA at all.** The 2026-05-07 roadmap
|
||
(Phase 4) puts re-evaluating bandit learning on hold until the
|
||
SLM-driven dispatcher is in production. Three options on the
|
||
table: keep bandit as feedback for the SLM, retire EMA in
|
||
favour of qualitative outcome summaries fed to the SLM, or
|
||
split responsibilities (SLM = intent routing, bandit =
|
||
cost/quality within a tier). See
|
||
[`docs/superpowers/plans/2026-05-07-gnoma-roadmap.md`](docs/superpowers/plans/2026-05-07-gnoma-roadmap.md)
|
||
§Phase 4.
|
||
|
||
2. **User-tunable selector knobs.** Several constants are
|
||
hardcoded today: `qualityAlpha` (EMA smoothing, ~3-sample
|
||
memory), the 70/30 observed/heuristic blend,
|
||
`strengthScoreBonus` for tagged task types, and the
|
||
`DefaultThresholds.Minimum` quality floor. Surfacing these as
|
||
`[router.bandit]` config keys would let users tune for their
|
||
workloads (faster alpha for shifting model performance, longer
|
||
memory for stable fleets) without waiting for the strategic
|
||
decision in #1.
|
||
|
||
Surfaced from the r/coolgithubprojects v0.3.1 launch thread
|
||
(2026-05-24, `u/Ha_Deal_5079`). The encoder + contextual bandit
|
||
alternative is now sketched in
|
||
[`docs/superpowers/plans/2026-05-25-encoder-bandit-router.md`](docs/superpowers/plans/2026-05-25-encoder-bandit-router.md) —
|
||
that plan supersedes #1 above when it ships.
|
||
|
||
- **Security boundary — egress controls + session audit log.** The
|
||
current `Firewall` is a content boundary only (scans messages and
|
||
tool results for secrets via regex + Shannon entropy, redacts or
|
||
blocks, logs via `log/slog`). It does not enforce network egress —
|
||
outgoing HTTP from tools and providers uses stock `http.Client`
|
||
with no per-host allowlist or dial-layer interception. Two follow-
|
||
ups surfaced from the r/SideProject v0.3.0 launch thread
|
||
(2026-05-24, `u/Secret_Theme3192`):
|
||
1. **Per-session audit log of blocked/redacted events** —
|
||
grep-able file at `.gnoma/sessions/<id>/audit.jsonl` so the
|
||
user can answer "what did the firewall do this session?" in
|
||
one command. Today the `slog` output goes to whatever sink is
|
||
configured, with no per-session grouping.
|
||
2. **Per-host egress allowlist (HTTP transport layer)** — open
|
||
design question: host-level (`allow api.openai.com, deny *`)
|
||
vs per-tool (`bash can only hit these hosts`). Reply asked
|
||
the commenter for their mental model; revisit when feedback
|
||
lands. The README and v0.3.0 Reddit post phrasing oversold
|
||
"network egress gated"; corrected in the same commit as this
|
||
TODO entry.
|
||
|
||
- **Tool-router specialization (functiongemma)** — gated on telemetry,
|
||
not committed. Phase A.2 adds did-switch-rate measurement to the
|
||
two-stage `select_category` path; Phase A.3 (LoRA fine-tune of
|
||
`functiongemma-270m-it` as a dedicated `ArmRoleToolRouter`) only
|
||
fires if did-switch rate exceeds 20 %. Three independent external
|
||
reviews consulted 2026-05-23; consensus is "fits as tool-call
|
||
router, not chat; fine-tuning mandatory; prove the need first."
|
||
See
|
||
[`docs/superpowers/plans/2026-05-23-tool-router-specialization.md`](docs/superpowers/plans/2026-05-23-tool-router-specialization.md).
|
||
- **Entropy FP reduction (post-SLM Phase F)** — F-1 (format-aware
|
||
pre-extractor) shipped 2026-05-22: `[security].entropy_safelist`
|
||
with `uuid`, `sha_hex`, `iso8601`, `url`; default empty so
|
||
pre-F-1 behaviour is unchanged. F-2 (SLM-assisted classifier for
|
||
ambiguous entropy hits) remains gated on F-1 FP-rate telemetry
|
||
from real workloads plus ≥50 SLM observations. Surfaced from the
|
||
r/ollama launch thread (2026-05-20); external validation from
|
||
alterlab.io on the same tiered approach. See
|
||
[`docs/superpowers/plans/2026-05-19-post-slm-unlock.md`](docs/superpowers/plans/2026-05-19-post-slm-unlock.md).
|
||
- **Compound tools (post-SLM Phase E)** — held until ≥50 SLM
|
||
observations inform which primitives are worth adding. See
|
||
[`docs/superpowers/plans/2026-05-19-post-slm-unlock.md`](docs/superpowers/plans/2026-05-19-post-slm-unlock.md).
|
||
- **Sensitive-content handling — unified policy.** Three input paths
|
||
can introduce sensitive content into the context: pasted images
|
||
(screenshots may contain secrets, API keys, PII), pasted text (often
|
||
copied straight from a terminal with credentials), and tool-read
|
||
files (`.env`, key files, etc.). Today these are handled
|
||
inconsistently: incognito gates persistence but content still flows
|
||
to providers; outgoing-scan firewall covers some patterns but is
|
||
format-aware only for text. Need a single policy/UI: at-paste
|
||
warning when the content matches sensitive heuristics, a
|
||
consent-gated review step, and consistent treatment across the
|
||
three paths. Cross-cuts with Phase F entropy work and the
|
||
outgoing-scan firewall. Plan:
|
||
[`docs/superpowers/plans/2026-05-24-sensitive-content-policy.md`](docs/superpowers/plans/2026-05-24-sensitive-content-policy.md).
|
||
- **Distribution — follow-ups.** v0.1.0 shipped (archives on
|
||
github.com/VikingOwl91/gnoma/releases, multi-arch images on
|
||
ghcr.io/vikingowl91/gnoma). Still optional: Homebrew tap,
|
||
`curl | sh` installer script, signed checksums (cosign/sigstore),
|
||
release note automation, Windows process-tree kill via
|
||
golang.org/x/sys/windows job objects (currently `os.Process.Kill`
|
||
only — see `internal/mcp/transport_windows.go`), and migration
|
||
from `dockers` + `docker_manifests` to `dockers_v2` in
|
||
`.goreleaser.yml` (collapses ~45 lines into one block but
|
||
requires Dockerfile changes for the per-platform binary layout
|
||
— deferred to its own commit before v0.3.0).
|
||
|
||
## Stable backlog (not in active phases)
|
||
|
||
- **Thinking mode** (disabled / budget / adaptive) — M12.
|
||
- **Structured output** with JSON schema validation — M12.
|
||
- **Native agy JSON output** — switch the subprocess provider to
|
||
`--output-format stream-json` once the agy CLI supports it,
|
||
replacing the current prompt-augmentation fallback. Until then,
|
||
agy's `ToolUse` capability is set to `false` (see
|
||
`internal/provider/subprocess/agent.go` agy entry) — without
|
||
structured tool-call output, the router would otherwise dispatch
|
||
tool-needing tasks to agy and the turn would hang on prose
|
||
hallucinations of tool calls. Flip the capability back to `true`
|
||
in the same change that lands stream-json parsing.
|
||
- **SQLite session persistence** + serve mode — M10.
|
||
- **Task learning** (pattern recognition, persistent tasks) — M11.
|
||
- **Web UI** (`gnoma web`) — M15.
|
||
- **OAuth / keyring** — M13.
|
||
- **Observability** (feature flags, cost dashboards) — M14.
|
||
- **PE / Mach-O ELF support** — future, after ELF Phase 6.
|
||
|
||
## History
|
||
|
||
Completed initiatives, kept here as pointers to their plan files:
|
||
|
||
- **v0.1.0 release** — 2026-05-20. First tagged release. GoReleaser
|
||
pipeline produces six static archives (linux/darwin/windows ×
|
||
amd64/arm64) on the GitHub mirror plus multi-arch Docker images on
|
||
GHCR. History was rewritten on the same day to migrate authorship to
|
||
a noreply identity and strip co-author attribution.
|
||
|
||
- **Post-audit security hardening** — complete 2026-05-19. Three waves
|
||
+ one ADR closed all 14 findings from the external review:
|
||
- [Wave 1 — SafeProvider boundary](docs/superpowers/plans/2026-05-19-security-wave1-safeprovider.md)
|
||
- [Wave 2 — Incognito coherence](docs/superpowers/plans/2026-05-19-security-wave2-incognito.md)
|
||
- Wave 3 — scanner + path hygiene (rolled out directly without a
|
||
plan file; see commits leading up to 2026-05-19 on `internal/security`)
|
||
- [ADR-004 — PostToolUse hook ordering](docs/essentials/decisions/004-posttooluse-hook-ordering.md)
|
||
- **Post-SLM unlock** —
|
||
[plan](docs/superpowers/plans/2026-05-19-post-slm-unlock.md). Phases
|
||
A–D complete (two-stage tool routing, CLI agent binary override,
|
||
user profiles, per-arm capability tags).
|
||
- **2026-05-07 roadmap** —
|
||
[plan](docs/superpowers/plans/2026-05-07-gnoma-roadmap.md). M1–M8
|
||
done; SLM classifier (Phase 3) complete; Phase 4 superseded by the
|
||
post-SLM plan.
|
||
|
||
## Reference
|
||
|
||
- Milestones: `docs/essentials/milestones.md`
|
||
- Decisions: `docs/essentials/decisions/`
|
||
- ADR-002 (SLM routing, supersedes earlier ADR-009): `docs/essentials/decisions/002-slm-routing.md`
|