24 Commits

Author SHA1 Message Date
vikingowl 24945b1eb2 docs(plans): encoder + contextual-bandit router architecture
Captures the architectural research surfaced during the 2026-05-25
SLM-failure diagnostic session: RouteLLM treats routing as
classification, ModernBERT is well-suited to that classification, and
FunctionGemma fits as an optional JSON-sanity layer rather than the
primary classifier. The current decoder-SLM-as-classifier design is
the wrong shape (100% failure rate observed across two model swaps).

Five-phase plan:
  1. Embedding feature scaffold (near-term, additive, opt-in)
  2. Contextual bandit (LinUCB / Thompson) over the feature set
  3. Retire the decoder-SLM classifier once 2 outperforms
  4. ModernBERT fine-tune on the accumulated labelled data
  5. FunctionGemma JSON sanity layer (optional final stage)

Phase 1 is the only piece scoped for near-term implementation; the
rest is multi-month and hinges on the strategic 'EMA vs SLM'
question already tracked in TODO.

Cross-references the existing tool-router-specialization plan so a
reader of either lands on both. Updates the TODO entry for the
bandit selector to note the supersession path.
2026-05-25 01:22:18 +02:00
vikingowl fa65a68728 docs(plans): config-migration and sensitive-content-policy
Release / release (push) Has been cancelled
Promotes two TODO entries into phased plan docs and links them
from the TODO bullets.

config-migration plan covers the silent layered-config corruption
chain (encoder zero-spam -> reader overwrite -> wrong effective
values) and its remediation across five phases: encoder fix
(omitempty + pointer-numeric hybrid), project registry, gnoma
doctor, gnoma upgrade-config, and auto-migration on startup with
banner notice.

sensitive-content-policy plan unifies three input paths (pasted
text, pasted images, tool-read files) behind one decision API
with consistent UI surface and audit-log integration. Phases A-E
sequence the work from highest-leverage (text paste) to most
complex (image OCR with local vision arm).

Neither plan starts implementation in this commit — they exist to
make the design decisions explicit so the eventual code can be
reviewed against a written intent rather than a TODO bullet.
2026-05-24 22:51:33 +02:00
vikingowl b13a6a2801 docs(plans): mark v0.3.0 plans shipped
Three plans shipped end-to-end in v0.3.0; removing them from
TODO.md In-flight and adding a Status: shipped header to each
plan doc with the commit references.

Shipped:
- 2026-05-23-routing-defaults-refresh.md
- 2026-05-23-prefer-routing-policy.md
- 2026-05-23-startup-safety-banner.md

Still in flight (telemetry-gated, fires only if measurements
support it):
- 2026-05-23-tool-router-specialization.md
2026-05-23 22:45:05 +02:00
vikingowl c483656681 docs(plans): fix gnoma one-shot invocation in safety-banner plan
gnoma takes the prompt as a positional argument, not via -p (that's
Claude Code's syntax). Surfaced when the maintainer tried the
manual smoke from the plan's "Definition of done" section and hit
the "flag provided but not defined: -p" error.

  before: gnoma -p "test"
  after:  gnoma "test"

The same wrong syntax appears in the f9094f6 / 3eeb5b4 commit
messages but those are immutable. This commit also serves as the
public record of the typo so future readers don't repeat it.
2026-05-23 22:26:56 +02:00
vikingowl d206b3cf09 docs: routing-prefer + startup-safety user docs, plan tier-shift note
README:
- New "Preferring local vs cloud" subsection under "Routing
  defaults" — table of the three [router].prefer values, priority
  order against forced arm / incognito / Strengths, and the
  CLI-agent-counts-as-local clarification.
- New "Startup safety check" subsection under "Security" — tier
  table, [safety] config block, --dangerously-allow-anywhere flag,
  container detection note, link to the plan doc.

Plan doc (prefer-routing-policy):
- Approach section updated to describe the tier-shift mechanism
  that actually shipped, with a clear "Implementation note"
  explaining why the original score-multiplier approach was
  abandoned (cost-floor math gives local arms a ~280x raw-score
  advantage that any reasonable multiplier can't overcome).
- CLI-agent placement flipped from "non-local" to "local" with
  rationale — implementation chose user-facing behavior axis over
  the privacy axis the original draft used.
- Tier-shift rationale table replacing the multiplier rationale.
- P-3 task rewritten to reflect the actual implementation (checked
  off and pointing at the right code), with the policyMultiplier
  helper noted as a within-tier nudge of limited present effect.

The implementation-vs-plan deviation is now documented in both the
plan doc and the original feature commit message (f9094f6). Future
readers reach the same understanding via either path.
2026-05-23 22:23:57 +02:00
vikingowl 162c8b1017 docs(plans): prefer-routing-policy and startup-safety-banner
Two parallel pre-flight plans surfaced in the 2026-05-23 session,
both deferred while the routing-defaults-refresh implementation
landed. Drafted as separate plans because they're independent:
the prefer-policy is a router scoring change; the safety banner is
a launch-time check that never touches the router.

prefer-routing-policy
  [router].prefer = "local" | "cloud" | "auto" — soft score
  multiplier (0.3 / 0.5 / 1.0) biasing toward local or cloud arms
  while preserving Strengths cross-tier promotion and bandit
  learning. Default "auto" is byte-identical to current behavior.
  Forced arms and incognito retain priority. CLI-agent subprocess
  arms count as non-local for this knob (they proxy to cloud).

startup-safety-banner
  Three-tier cwd classification at launch — refuse in /etc /sys
  and other system roots; warn+keypress in $HOME, /tmp, ~/Desktop,
  ~/Downloads; OK inside any git repo or directory with a project
  marker (.gnoma/, go.mod, package.json, etc.). Always shows a
  context banner with cwd, git state, model, modes, and a
  top-level sensitive-file inventory (.env, id_rsa, *.pem, .ssh/,
  etc. — informational only, no recursion, capped at 1000 entries).
  Bypass via --dangerously-allow-anywhere. Complements the in-flight
  sensitive-content unified-policy TODO item: this is the pre-flight
  layer, that is the runtime input-path layer.

Both plans default-on with safe defaults; both have explicit
out-of-scope sections to prevent scope creep during implementation.
Linux + macOS first; Windows path classification deferred.

TODO.md surfaces both as in-flight.
2026-05-23 22:00:21 +02:00
vikingowl a79e99199d feat(router): non-chat exclude, vision prefixes, family-defaults scaffold
Discovery previously registered every model returned by Ollama as a
chat arm, including embeddings, ASR, TTS, audio realtime, and
rerankers — which then failed at inference time when the router
selected them. Local arms also shipped with all-zero defaults, so
selection between e.g. tiny3.5:1.5b, phi-4:14b, and qwen3-coder:30b
was effectively random.

This change covers tasks R-1, R-2, R-6 from the routing-defaults plan.

- nonChatModelPatterns + isNonChatModel substring matcher; matched
  IDs are skipped during RegisterDiscoveredModels. Covers whisper,
  moonshine, kokoros, vibevoice, -asr, -tts, -audio, -embedding,
  embeddinggemma, -reranker, lfm2.
- knownVisionModelPrefixes gains gemma4, gemma-4, glm-ocr. gemma3
  and minicpm-v entries stay for regression coverage.
- New internal/router/defaults.go with FamilyDefaults struct,
  knownFamilyDefaults map, and ResolveFamilyDefaults longest-prefix
  lookup (with org/-namespace stripping so reecdev/tiny3.5:1.5b
  resolves to "tiny3.5"). Single entry for now: functiongemma is
  registered with Disabled=true and MaxComplexity=0.40, reserved for
  the future ArmRoleToolRouter path. Table will grow in R-3.
- RegisterDiscoveredModels consults ResolveFamilyDefaults and only
  populates fields that are still zero on the arm, so user [[arms]]
  overrides keep priority.

Plans:
- docs/superpowers/plans/2026-05-23-routing-defaults-refresh.md
- docs/superpowers/plans/2026-05-23-tool-router-specialization.md

TODO.md surfaces both as in-flight items.
2026-05-23 21:24:59 +02:00
vikingowl 49d80cf847 feat(security): format-aware entropy safelist (Phase F-1)
Add a deterministic pre-extractor that skips known-safe token shapes
before they reach the entropy scorer. Targets the false-positive
regime that bites under lowered entropy_threshold or
redact_high_entropy = true — UUIDs (~3.4 bits), SHA hex digests
(~3.9 bits), ISO-8601 timestamps, and HTTP(S) URLs.

Config knob lives under the existing security section to match
entropy_threshold / redact_high_entropy convention:

  [security]
  entropy_safelist = ["uuid", "sha_hex", "iso8601", "url"]

Empty / unset preserves pre-F-1 behaviour exactly — users opt in.

Per-pattern Debug telemetry fires on every skip (pattern name +
token length, never the token bytes). This is the data F-2's
go/no-go gate depends on; the plan literally specifies it.

NewFirewall validates names at the config boundary and emits a
Warn for unknown entries so a typo like "uid" instead of "uuid"
surfaces loudly instead of silently disabling FP reduction.

Tests cover: UUID/SHA-1/SHA-256 skipped at lowered threshold,
mixed payload (safe shape + real secret) preserves the secret,
secret-adjacent-to-UUID regression guard, empty safelist preserves
pre-F-1 behaviour, unknown name silently dropped at scanner level
but warned at firewall level, end-to-end FirewallConfig wiring,
and the skip-telemetry log line.

F-2 remains gated on real-workload FP-rate observations.
2026-05-22 12:39:10 +02:00
vikingowl 7d0e35b0f4 docs: record Phase F external validation, surface in active TODOs 2026-05-20 19:15:49 +02:00
vikingowl 8d6e66533b docs(plans): add Phase F entropy FP reduction to post-SLM plan 2026-05-20 10:06:43 +02:00
vikingowl 3ae40083f1 docs(security): Wave 2 plan — incognito coherence
Plan for the second hardening wave. Six findings closed in one PR:
W2-1 router rejects forced non-local under local-only; W2-2 persist
store consults IncognitoMode + 0o600/0o700 perms; W2-3 TUI seeds
incognito from firewall; W2-4 quality/outcome gates read firewall
instead of CLI flag; W2-5 session perms 0o600; W2-6 remove dead
IncognitoMode.LocalOnly field.
2026-05-19 22:44:20 +02:00
vikingowl 8dcca64e41 feat(security): add SafeProvider boundary wrapper (W1-1)
Introduces internal/security/SafeProvider — a provider.Provider decorator
that scans outgoing messages and the system prompt through the firewall
before delegating to the inner provider. Tool-result redaction stays in
the engine because it needs per-tool context the boundary lacks.

FirewallRef provides a late-binding atomic.Pointer[Firewall] so the
wrapper can be installed before NewFirewall runs in main. A nil or
unset ref makes SafeProvider a pass-through — preserves the current
init order without lock contention or panics.

Wave 1 of the post-audit hardening plan
(docs/superpowers/plans/2026-05-19-security-wave1-safeprovider.md).
Closes the architectural critique that secret scanning only ran inside
engine.buildRequest(), leaving SLM/summarizer/hook/routerStreamer paths
to send raw payloads. This commit only ships the wrapper; W1-2 and W1-3
will wire it through main and the four bypass sites.
2026-05-19 22:28:46 +02:00
vikingowl d84b295da2 feat(tui): /profile slash command + status-bar profile badge (Phase C-3)
Adds the in-TUI surface for the profile system:

- Status bar carries " · profile: <name>" next to the SLM badge when
  profile mode is engaged (renders nothing in legacy single-config
  installations).
- /profile (no args) shows the active profile and lists available ones.
- /profile <name> switches by re-executing gnoma via syscall.Exec under
  --profile <name>. Critical cleanups (quality.json snapshot, SLM
  backend Close, session.Close) fire explicitly before exec since
  defers don't run after exec replaces the process image. Using
  syscall.Exec rather than a child process avoids stacking a process
  level on every switch and propagates the new gnoma's exit code
  directly to the shell.
- Autocomplete after "/profile " offers configured profile names; the
  completion source is threaded from main.go via tui.Config.

Conversation history is not preserved across a switch — profile change
implies different context, different keys, different permission mode,
so a clean reset is the correct semantic.
2026-05-19 21:59:11 +02:00
vikingowl 8450005b31 feat(cli): gnoma profile list/show subcommands (Phase C-2)
`profile list` enumerates configured profiles and marks default + active.
`profile show <name>` prints the merged effective config the profile
would produce — sections, configured key names (values never), CLI
agent overrides, arms, hooks, MCP servers, per-profile quality and
session paths.

Both commands work as a recovery affordance when profile resolution
is broken: list flags a missing-default explicitly with
"<name> (default, missing)", and the dispatcher falls back to a
base-only load (new gnomacfg.LoadBase) so the diagnostics still run.

API key values are filtered out of `profile show` — the output is safe
to paste in a help channel or attach to a bug report.
2026-05-19 21:44:50 +02:00
vikingowl 635dad660c feat(config): per-profile config layering with --profile flag (Phase C-1)
Adds opt-in user profiles for swapping API keys, CLI binaries, and
permission modes between contexts (work/private/experiment/...).

Profile mode engages only when ~/.config/gnoma/profiles/ exists, so
existing single-config installations are untouched. Selection order:
--profile flag → default_profile in base config → fatal error.

Layering: defaults → ~/.config/gnoma/config.toml → profiles/<name>.toml
→ <projectRoot>/.gnoma/config.toml → env. Map sections merge per-key;
[[arms]] and [[mcp_servers]] merge by id/name; [[hooks]] appends.

Per-profile data: quality-<name>.json and sessions/<name>/ keep the
bandit and session list from cross-contaminating between profiles.

Profile names restricted to [A-Za-z0-9_-] to block --profile=../foo
path traversal into derived paths.
2026-05-19 21:35:33 +02:00
vikingowl 0aabd19906 feat(router): per-arm strengths + cost weight (Phase D)
Plan D from docs/superpowers/plans/2026-05-19-post-slm-unlock.md
(static portion; dynamic bandit-driven promotion deferred to D-2).

Routing previously let tier ordering (CLI > local > API) dominate
selection — Opus, in tier 3, would lose to a tier-1 CLI agent for
SecurityReview even though Opus is empirically stronger at that task.
This change introduces explicit per-arm overrides:

  [[arms]]
  id = "anthropic/claude-opus-4-7"
  strengths = ["security_review", "planning"]
  cost_weight = 0.3

Strengths gate cross-tier promotion: arms matching task.Type bypass
the tier loop and compete with each other directly. Promotion is a
preference, not a pin — if no strength-tagged arm is feasible
(backoff, pool capacity, tool support), selection falls through to
the default tier order.

CostWeight linearly dampens the cost penalty in scoreArm via
  effectiveCost = 1 + CostWeight * (cost - 1)
CostWeight=1.0 (or unset) preserves current behavior; lower values
trade cheapness for quality. The earlier draft used cost^CostWeight
which inverts direction for sub-1 local-arm costs (raising a
fraction <1 to a fractional power makes it bigger, not smaller); a
monotonicity regression test prevents that drift.

- internal/router/arm.go: Strengths []TaskType, CostWeight float64,
  HasStrength(), ResolvedCostWeight() (zero → 1.0).
- internal/router/selector.go: scoreArm strength bonus const
  (strengthScoreBonus = 0.15) + linear cost dampening; selectBest
  cross-tier promotion before tier loop.
- internal/router/router.go: ArmOverride type + ApplyArmOverrides()
  returns unknown IDs; unknown strength names skipped with per-name
  warning via slog.
- internal/router/task.go: ParseTaskTypeStrict() returns ok bool;
  ParseTaskType now delegates so the two switches stay in sync.
- internal/config/config.go: ArmConfig + [[arms]] TOML wiring.
- cmd/gnoma/main.go: applies overrides after all initial arms
  register; logs a warning when an [[arms]] id has no matching
  registered arm.

Tests cover: predicate helpers, scoring direction across two arms,
linear-formula monotonicity on both sides of cost=1, cross-tier
promotion, empty-Strengths preserves tier order, promoted arm in
backoff falls through via full Router.Select path, observed-quality
tiebreak between two strength-tagged arms, ApplyArmOverrides happy
path + unknown-ID reporting + unknown-strength skipping.
2026-05-19 21:14:45 +02:00
vikingowl b331dcd61a feat(subprocess): per-agent binary override via [cli_agents] config
Plan B from docs/superpowers/plans/2026-05-19-post-slm-unlock.md.

Users with aliased CLI binaries (claude-priv, claude-work,
gemini-personal) can now point gnoma's auto-discovery at them
without renaming. The override flows through to the actual subprocess
spawn at internal/provider/subprocess/provider.go:56, so routing
through the alias is functional, not cosmetic.

Config:
  [cli_agents]
  claude = "claude-priv"   # discovery uses claude-priv instead of claude
  gemini = ""              # empty value = no override (fall back to canonical)
  # vibe is absent = canonical name used

- internal/config/config.go: CLIAgentsSection map[string]string;
  TOML [cli_agents] key.
- internal/provider/subprocess/agent.go:
  - Package-level lookPath = exec.LookPath for test injection.
  - resolveAgentBinary(canonical, override) → (path, binName, err).
    Override='' falls back to canonical. Override set but missing from
    PATH returns an error (no silent fallback — masks user typos).
  - DiscoveredAgent.OverrideBinary records the override binary name
    when one was used; empty otherwise.
  - DiscoverCLIAgents(ctx, overrides) signature; warning logged when
    an override is configured but the binary isn't on PATH.
- cmd/gnoma/main.go: both call sites pass cfg.CLIAgents. The
  `gnoma providers` listing renders `claude-priv (via [cli_agents].claude)`
  when an override is in effect.

Tests cover: 5 resolver cases (no override, override set, empty
override falls back, override missing, canonical missing); 4
discovery cases (no overrides, override resolves alias, empty value
falls back, override missing skips agent); 2 config round-trip cases.
2026-05-19 21:02:16 +02:00
vikingowl 43ea2e562d feat(engine): two-stage tool routing for small local arms
Plan A from docs/superpowers/plans/2026-05-19-post-slm-unlock.md.

Small local SLMs (<=16k context) waste ~1500 tokens per turn on the
full tool catalogue. Two-stage routing replaces round-1 tools with a
single synthetic select_category schema; round-2+ sends only the
selected category's real tool schemas plus select_category for
re-selection.

- internal/tool/category.go: Category type, optional Categorized
  interface, CategoryOf() with meta fallback. fs.read/fs.ls -> read,
  fs.write/fs.edit -> write, fs.glob/fs.grep -> search, bash -> exec.
- internal/engine/twostage.go: synthetic select_category tool,
  intercept helper, per-turn selectedCategory state under e.mu.
- Engine round 1 forces ToolChoiceRequired so SLMs don't fall back to
  prose. State resets at the top and end of every runLoop.
- Activates automatically on a forced local arm with ContextWindow
  <=16384, or via [router].force_two_stage TOML key.
- Integration test drives a 3-round trip and asserts: round 1 emits
  exactly one schema (synthetic) with ToolChoiceRequired, round 2
  contains only write-category schemas + select_category, real
  fs.write executes. Invalid-category fallback round-trips back to
  round-1 mode.
2026-05-19 20:53:21 +02:00
vikingowl 21da29e73e docs(plan): capture post-SLM-unlock outstanding work
New dated plan at docs/superpowers/plans/2026-05-19-post-slm-unlock.md
covers the work surfaced during this session that hasn't shipped yet:

Phase A — two-stage tool routing (last item from the original
smallcode audit; gates on local + small-context arms; saves ~70% of
schema tokens per request).

Phase B — CLI agent binary override. [cli_agents] config section lets
users map canonical agent names (claude / gemini / vibe) onto local
aliases (claude-priv, gemini-work, etc.).

Phase C — user profiles. Multiple named configs (work / private /
experiment) layered over a base config.toml, switchable via
--profile flag, [config].default_profile, and a /profile TUI command.

Phase D — per-arm capability tags (Phase-4 prep). Per-arm Strengths
[]TaskType and CostWeight to make the router actually pick Opus over
Gemini for Planning/SecurityReview etc., not just for cost reasons.

Phase E — compound tools (deferred until SLM-arm telemetry shows
which chain patterns fail).

Plus an explicit drop list of things we considered and won't ship.
TODO.md updated to point at the new plan and note that the original
roadmap's Phase 4 is now superseded.
2026-05-19 19:31:40 +02:00
vikingowl a9213ec382 feat(slm): Wave C — SLM classifier, MaxComplexity routing, CLI subcommands, TUI status
- slm.Classifier: openaicompat → llamafile, 2s timeout + heuristic fallback,
  heuristic baseline blended so Priority/RequiredEffort are never zeroed,
  extractJSON strips markdown fences from small-model responses
- router.ParseTaskType: case-insensitive string → TaskType, unknown → TaskGeneration
- router.Arm.MaxComplexity: zero = no ceiling (preserves existing arm behavior);
  filterFeasible excludes arms when task.ComplexityScore > MaxComplexity
- config.SLMSection: [slm] enabled / model_url / data_dir
- openaicompat.NewLlamafile: no API key, model = "default", no retries
- slm.Manager: DefaultDataDir() (XDG), Manifest() accessor
- cmd/gnoma: `gnoma slm setup` / `gnoma slm status` subcommands; SLM arm
  registered with MaxComplexity=0.3 when enabled + set up
- tui: /config shows slm status (ready/missing/not set up + base URL if running)
- docs: roadmap updated to reflect llamafile pivot from Ollama
2026-05-07 16:44:32 +02:00
vikingowl 5569d4fb86 docs: consolidated roadmap, ADR-013, drop stale plans
- New 7-phase roadmap (2026-05-07-gnoma-roadmap.md) covering M8 cleanup,
  PTY interactive shell, SLM classifier, router revisit, USP security,
  ELF support, and distribution
- ADR-013 (002-slm-routing.md): SLM-first routing supersedes ADR-009;
  Thompson Sampling deferred pending SLM production data
- ADR-009 status updated to "Superseded by ADR-013"
- gemma-integration-analysis.md: header note that Node.js specifics
  (LiteRT-LM, daemon, PID) don't apply to gnoma's Go implementation
- TODO.md replaced with thin pointer to roadmap + stable backlog
- Deleted stale plan/spec files: m6-m7-closeout, m8-hooks-design
2026-05-07 15:06:54 +02:00
vikingowl fef38b3502 docs: M8.1 hook system design spec 2026-04-06 02:42:34 +02:00
vikingowl 43dcc7e9de docs: M6/M7 close-out implementation plan — 8 tasks, TDD, full file map 2026-04-05 21:33:42 +02:00
vikingowl 252ffde732 docs: M6/M7 close-out design spec — tool persistence, tokenizer, router feedback, coordinator 2026-04-05 21:22:26 +02:00