gnoma

Author	SHA1	Message	Date
vikingowl	f321dabce3	feat(config): Phase 3 — `gnoma doctor` diagnostic command Phase 3 of the 2026-05-24 config-migration plan. Read-only diagnostic over config files. Pairs with `gnoma upgrade-config` from the previous slice: doctor finds things upgrade-config can't fix, upgrade-config fixes the things it can. What doctor surfaces (severity-ranked): error — file unreadable, file unparseable warn — unknown top-level keys (decoder silently ignores them today) — invalid enum values (permission.mode, router.prefer, slm.backend) — explicit-zero pointer fields whose resolved value diverges from the default (e.g. max_tokens = 0 when default is 8192) info — (reserved; current diagnostics are warn+) What doctor does NOT yet surface: - Per-field zero-spam inside a partially-set section (e.g. user wrote [provider] default = "anthropic" with no other fields — those are at Go zero but the encoder's omitempty handles them on the next write). Catching this requires per-key source-tracking that BurntSushi's MetaData doesn't expose for nested fields; tracked as a follow-up. - Cross-file layering bugs (e.g. project file's prefer = "" silently shadows global's prefer = "cloud"). That requires loading the full layered config and diffing per-section — could be a follow-up to doctor, or the per-project upgrade-config --all flow. CLI surface (`cmd/gnoma doctor`): gnoma doctor scan the project config (default — cwd's .gnoma/config.toml) gnoma doctor <path> scan a specific file gnoma doctor --all-projects walk the registry, scan global + every known project gnoma doctor --json structured JSON to stdout (severity as string, suitable for CI/scripts) exit code: 0 = clean, 1 = any warn/error Help text: `gnoma -h` now lists `doctor` alongside the other subcommands. Implementation: internal/config/doctor.go Severity, Finding, Doctor, DiagnoseFile, DiagnoseFiles (~150 lines). internal/config/doctor_test.go 11 tests covering each finding type + Severity.String. cmd/gnoma/doctor_cmd.go CLI dispatch + JSON / text rendering + exit code. cmd/gnoma/doctor_cmd_test.go 5 tests for the CLI surface. internal/config/load.go new ProjectConfigPathFor helper for --all-projects (constructs a project config path from an arbitrary root without chdir). cmd/gnoma/main.go dispatch case + -h help text. Severity.MarshalJSON is custom: encodes the int as its lower-case name string ("warn" not 1) for stable CI consumption. Tests assert on the string form. End-to-end check on a synthetic config with multiple findings: $ gnoma doctor warn ...:permission.mode invalid permission.mode "yes" ... → fix the value, or remove the line warn ...:provider.max_tokens explicit zero for provider.max_tokens (resolved to 0); the default is 8192. ... warn ...:unknown_section unknown top-level key "unknown_section" ... warn ...:unknown_section.foo unknown top-level key "unknown_section.foo" ... exit: 1 $ gnoma doctor --json [ { "severity": "warn", "path": "...", "key": "permission.mode", "message": "..." }, ... ] Quality pipeline: gofmt -l . clean go vet ./... clean golangci-lint run ... 0 issues on touched packages go test ./... all pass (only the pre-existing TestStartBackend_Auto_NothingReachable environmental failure remains) Refs: docs/superpowers/plans/2026-05-24-config-migration.md § Phase 3.	2026-06-04 18:05:14 +02:00
vikingowl	56d7217668	feat(config): Phase 2 — project registry at ~/.config/gnoma/projects.json Phase 2 of the 2026-05-24 config-migration plan. Adds a per-user list of directories gnoma has been launched in. Powers `gnoma doctor --all-projects` (Phase 3) and `gnoma upgrade-config --all` (Phase 4 --all-projects), and unblocks the cross-project session picker / stats features called out in the original plan. Schema (stable for v0.4.x): { "projects": [ { "path": "/home/user/git/foo", "first_seen": "2026-04-15T10:30:00Z", "last_seen": "2026-05-24T19:23:00Z", "session_count": 47 } ] } API: - LoadRegistry() / LoadRegistryAt(path) — read from canonical path or test-injected path. Missing file → empty registry, no error. Corrupt file → error (silent zero-ing would let broken files accumulate stale state). - (Registry).Record(projectRoot) — idempotent add/bump, atomic save. Empty projectRoot is a programmer error. - (Registry).Prune(staleBefore time.Duration) — returns the (sorted) list of pruned paths so callers can surface them in user-facing output. - RegistryFilePath() — exposes the canonical path for inspection and `rm` workflows. Implementation: - internal/config/registry.go (~120 lines): the Registry type with a sync.Mutex guarding Record/Prune. Saves through the same writeAtomicBytes helper that upgrade-config uses (temp file + sync + rename), so a crash mid-write can never leave a half-written registry. - internal/config/config.go: new SettingsSection type under `[config]` (the gnoma-level settings home — future log-level / telemetry flags will live here too). Field: ProjectRegistry bool, omitempty. nil = enabled (default true, preserves v0.3.x behavior); false = opt out. - internal/config/resolve.go: ResolvedConfig gains ProjectRegistry bool (single-bool mirror, no full ResolvedSettingsSection since there's only one field). nil → default true, false → false. Mirrors the existing nil→true convention used for SLM.RegisterAsArm. - internal/config/defaults.go: populates &projectRegistry{true} on Defaults(). - cmd/gnoma/main.go: calls LoadRegistry().Record(ProjectRoot()) right after the safety banner renders, gated on resolved.ProjectRegistry. Failure is logged at Warn level but never blocks startup. Also moved the resolved := cfg.Resolved() initialization to before the registry call (was previously at line 346 for the WriteTool setup) so the registry block can use it. - README.md: new bullet under §Security explaining the registry is purely local, never sent off-machine, and how to opt out via [config].project_registry = false. Tests (internal/config/registry_test.go, 14 cases): - LoadRegistryAt: missing file → empty, valid file parses, corrupt file errors. - Record: new project adds, existing bumps LastSeen + SessionCount, empty path errors, atomic-write hygiene (no .tmp- left), save→reload round-trip, creates parent dir on first save. - Prune: removes stale, keeps fresh, no-op when nothing stale, reports sorted pruned paths, empty-registry no-op, persists across reload. - Plus 2 new resolver tests for the default-true / explicit-false ProjectRegistry paths. End-to-end smoke (user's actual environment): $ cd /tmp/registry-smoke $ echo "test" \| gnoma --provider ollama $ cat ~/.config/gnoma/projects.json → { "projects": [ { "path": "/tmp/registry-smoke", ... } ] } $ printf '[config]\nproject_registry = false\n' \ > ~/.config/gnoma/config.toml $ echo "test" \| gnoma --provider ollama $ ls ~/.config/gnoma/projects.json → No such file (opt-out working) Not in this slice: - gnoma doctor --all-projects (Phase 3): uses the registry to enumerate projects, runs the per-file diagnostic from Phase 1 + the upgrade-config cleaner. - gnoma upgrade-config --all (Phase 4): walks the registry, calls Upgrade on each. - Cross-project session picker / stats: foundation is here, UI work is follow-up. Refs: docs/superpowers/plans/2026-05-24-config-migration.md § Phase 2.	2026-06-04 14:03:52 +02:00
vikingowl	da5b19c159	docs(help): list config + upgrade-config in gnoma -h The `gnoma config` (set/keys) and `gnoma upgrade-config` subcommands shipped in `70cd530` and `86ae142` but were never added to the top-level `gnoma -h` Subcommands list. Add them so users discovering the binary via --help can find the new commands without having to read the changelog. Each subcommand's own help text is already accurate; this fix is purely the top-level index.	2026-06-04 13:29:38 +02:00
vikingowl	86ae142dfe	fix(upgrade-config): friendly "no such file" + add --global flag First-run UX fix. `gnoma upgrade-config --dry-run` in a directory with no `.gnoma/config.toml` used to error with: error: read config: open .../.gnoma/config.toml: no such file or directory That's a hard error for what's actually a non-event. The cleanest user experience: tell the user there's no project config to upgrade, hint that they can pass an explicit path or use --global, and exit 0. Changes: 1. `cmd/gnoma/upgrade_config_cmd.go::runUpgradeConfigCommand` now stats the target before calling `gnomacfg.Upgrade`. For the implicit project/global targets, a missing file produces a friendly exit-0 message. An explicit path the user typed is still a hard error (caller asked for that specific file, didn't get it). 2. New `--global` flag, symmetric with `gnoma config set --global`. The user-level config is where zero-spam actually accumulates over time (most users never have a project config) so this is the more useful default target in practice. `--global <path>` is rejected as mutually-exclusive. 3. Rewrote the flag-parsing loop to avoid a Go slice-aliasing bug discovered while writing the tests. The original implementation did `pathArgs = append(args[:i], args[i+1:]...)` inside a `for i, a := range args` loop, which aliases the underlying array and overwrites earlier `a` values on subsequent iterations. With `--global --dry-run` the `--dry-run` overwrote `args[0]`, so the second iteration read `--dry-run` as `a` for both the `--dry-run` and `--global` cases. The new code walks `args` once and accumulates into a fresh `pathArgs` slice, no aliasing. Tests added in upgrade_config_cmd_test.go: - TestRunUpgradeConfig_MissingProjectConfigIsFriendly - TestRunUpgradeConfig_MissingGlobalConfigIsFriendly - TestRunUpgradeConfig_GlobalFlagUpgradesGlobalConfig - TestRunUpgradeConfig_GlobalWithExplicitPathIsError End-to-end check on the user's actual environment: $ gnoma upgrade-config --dry-run /home/.../gnoma/.gnoma/config.toml: no such file, nothing to upgrade hint: pass an explicit path, or use --global for the user-level config exit: 0 $ gnoma upgrade-config --global --dry-run /home/.../.config/gnoma/config.toml: already clean, nothing to do (dry run) exit: 0	2026-06-04 13:26:01 +02:00
vikingowl	70cd530578	feat(config): upgrade-config command + Duration pointer fix Closes the two follow-up caveats from the 2026-06-04 config-migration follow-up plan: Caveat 1 — Duration pointer conversion SLM.StartupTimeout and SLM.ClassifyTimeout are now Duration (pointer) instead of bare Duration. nil = "use documented default" (5s and 0s respectively); Duration(0) = explicit zero. ResolvedSLMSection added to the mirror so consumers read resolved time.Duration values instead of the raw pointer. cmd/gnoma/main.go, profile_cmd, and the SLM startup wiring all move through the mirror. The remaining cosmetic encoder issue (startup_timeout = 0 / classify_timeout = 0 written even with omitempty) is fixed because the BurntSushi encoder now sees a nil pointer when the user didn't set the field. ResolvedSLMSection's RegisterAsArm mirrors the existing nil→true default-substitution semantics from the field's doc comment; the if-nil check in main.go is collapsed to a direct read of resolved.SLM.RegisterAsArm. Caveat 2 — `gnoma upgrade-config` (single-file mode) New command that cleans a config file in place: drops pointer-converted fields whose resolved value matches the resolved default, leaves explicit-zero pointer fields alone (the "explicit zero preserved" contract from Phase 1), and writes the cleaned form atomically with a .bak-YYYYMMDD-HHMMSS backup of the original. Idempotent — a second run on the cleaned file reports "already clean, nothing to do" without creating a second backup. Cleaning rules per field type (encoded in internal/config/ upgrade.go::clean): - pointer-converted fields: null iff resolved value equals resolved default - non-pointer string / map / slice / numeric / bool fields: encoder's omitempty already handles them on rewrite; the cleaner doesn't touch them Diff output uses a simple line-by-line algorithm (added/ removed/neutral) via splitLines + a forward scan. Adequate for the small config files gnoma produces. A proper Myers diff could be vendored later — pmezard/go-difflib is already a transitive dep in go.sum. internal/config/load.go::ProjectConfigPath is now exported so the CLI can default the upgrade target to the project config when no path is given. --dry-run runs the upgrade then restores the file from the backup so the operation is truly side-effect-free. Scope notes Single-file mode only. --all-projects is deferred until the project registry (Phase 2 of the 2026-05-24 plan) lands — the follow-up doc calls this out as the natural next slice and it can be added as a follow-up PR without touching upgrade-config's core semantics. No-op test cases (TestUpgrade_NoChangesOnAlreadyCleanFile, TestUpgrade_KeepsExplicitUserValues, TestUpgrade_Keeps- ExplicitZeroPointerFields) assert the "resolved view is identical before and after" contract. Test coverage internal/config/upgrade_test.go: 10 tests (drops, keeps, backup, idempotency, diff, edge cases) internal/config/resolve_test.go: +3 tests for ResolvedSLM internal/config/write_test.go: +1 test for the Duration emission fix cmd/gnoma/upgrade_config_cmd_test.go: 3 tests for the CLI Refs: docs/superpowers/plans/2026-06-04-config-migration-followups.md	2026-06-04 13:18:30 +02:00
vikingowl	db7a47012e	docs(config): follow-up plan for Phase 1 caveats (Duration, pre-existing zero-spam, Bandit sentinel) Captures the three caveats shipped in a9bba42's commit body as a tracked plan: Duration fields still emit as int64, pre-existing zero-spam isn't auto-cleaned, BanditSection keeps the 0-sentinel pattern. Sizes and orders the follow-ups so Phase 2/3/4 of the original config-migration plan stay decomposable into independent PRs.	2026-06-04 12:54:47 +02:00
vikingowl	a9bba42c3d	fix(config): stop generating zero-spam on setConfig; add Resolved mirror The 2026-05-24 silent-corruption symptom: a `gnoma config set provider.default anthropic` call read the existing TOML into a zero-valued Config, set one field, then wrote the entire struct back. Every untouched field was serialized at its Go zero value (`mode = ""`, `max_tokens = 0`, etc.), and on the next layered load those present-but-zero fields silently shadowed higher- priority layers per TOML's "present field wins" semantics. This is Phase 1 of the 2026-05-24 config-migration plan: encoder-side only. Phases 2-5 (registry, doctor, upgrade-config, auto-migration) follow in subsequent slices. The fix is the hybrid approach the plan chose: - `,omitempty` on every string / map / slice field so absent keys aren't re-emitted. - Pointer conversion for the seven fields where the Go zero (`0`, `false`, `0.0`) is a legitimate user choice and the absent-vs-explicit-zero distinction matters: Provider.MaxTokens, Tools.MaxFileSize, Security.EntropyThreshold, Security.RedactHighEntropy, Router.ForceTwoStage, Session.MaxKeep, HookConfig.FailOpen. nil (absent) and *zero (explicit) are now distinguishable; the new Resolved() mirror substitutes Defaults() for nil so consumers see a clean concrete value. - Defaults() populates the new pointer fields with their default values so the resolver substitution is a no-op for the common case of "user didn't set it". - ResolvedConfig + Resolved() follow the ResolvedSafetySection precedent: a separate mirror type, constructed at the end of Load, with the boundary rule "raw cfg.X is internal; readers go through cfg.Resolved().X for pointer-converted fields". - setConfig now uses an atomic temp+rename write (writeAtomicTOML) so a crash mid-write can't leave a half-written config file. CLI surface: `gnoma config set [--global] <key> <value>` and `gnoma config keys` replace the dead help-string reference at cmd/gnoma/main.go:1538. All consumers of pointer-converted fields (cmd/gnoma/main.go, cmd/gnoma/profile_cmd.go, internal/hook/, internal/plugin/) move to the Resolved mirror. Test coverage: 6 resolver tests + 7 write tests + 3 CLI tests in the affected packages. Full go test ./... is green except for a pre-existing llamafile health-check timeout in internal/slm/backend_test.go that's environmental and unrelated to this change. Caveats (carried as follow-up work, not blockers): 1. Duration-typed fields (SLM.StartupTimeout, SLM.ClassifyTimeout) still emit as raw int64 even at zero. BurntSushi's encoder doesn't honor omitempty on the custom Duration type without a MarshalText method, and the existing MarshalText-less Duration type predates this fix. Cosmetic-only: 0 is the documented "use default" sentinel for both fields, so the value is semantically correct. Fix is a separate pointer-conversion PR on those two fields. 2. Pre-existing zero-spam in user config files is not auto-cleaned by a setConfig call on a different key. The user's recovery path remains: re-set the affected key (which the new omitempty + pointer semantics now rewrite correctly), or run `gnoma upgrade-config` (Phase 4). 3. BanditSection keeps the documented 0-sentinel pattern (0 = "use built-in default"). Pointer conversion was deliberately out of scope per the plan. Refs: docs/superpowers/plans/2026-05-24-config-migration.md	2026-06-04 12:52:55 +02:00
vikingowl	f8ab522bef	docs(todo,plans): specs for open features + MiniMax & ACP Add implementation-ready plans for the in-flight features that lacked one, and two new provider/protocol items: - MiniMax provider (cloud arm + Token Plan billing decision) - Agent Client Protocol (ACP) — dual role: gnoma as ACP agent and as ACP client driving external agents as router arms - Network egress allowlist (Learn/Review/Enforce); note the per-session audit log is already implemented, remaining gap is a viewer command - Cross-platform (Windows/macOS) code touch-points + build-tag pattern - Distribution follow-ups (cosign, brew tap, installer, dockers_v2) Link each plan from its TODO.md entry; mark audit-log item done.	2026-06-04 11:59:16 +02:00
vikingowl	98daebd359	docs(todo): cross-platform support — phase-breakdown + r/devops question map Extends the cross-platform smoke-test entry surfaced 2026-05-28 into a three-phase plan with concrete handles per concern: Phase 1 — CI smoke matrix per tag (linux/darwin/windows × amd64/arm64). Confirms the binary actually executes before any real bug-hunting. Phase 2 — Windows-specific concerns mapped to the r/devops question pattern u/HarjjotSinghh predicted ('crowd will ask within a week'). Each row: expected question, the gnoma-side gap it exposes, and the rough fix scope. Covers PowerShell shell quoting, WSL vs native, corporate-proxy / PAC support, Authenticode signing, MSI installer, Event Viewer integration, Group Policy hooks, and air-gapped install flow (ollama-dependency gap). Phase 3 — macOS concerns: Apple-silicon launch sanity + Gatekeeper / notarization warning on first run. Pre-condition added for the eventual r/devops post: Phase 1 must be in place before posting so the 'did you test it?' question has an honest answer. Phase 2 items each need at least TODO acknowledgement in the post body so the thread sees the gaps are tracked.	2026-05-27 19:13:01 +02:00
vikingowl	a468c3d2ed	docs(readme,todo): origin paragraph + egress design refinement README About section gets an Origin subsection describing how gnoma's security-first positioning emerged — not the original goal (provider- agnostic coding CLI was), but the answer to a gap that became obvious while building. Honest framing: 'the answer to what was missing, not the goal it set out with.' TODO updates from the r/SideProject thread (u/HarjjotSinghh, 2026-05-28) refine the security-boundary egress entry with a three-stage Learn → Review → Enforce rollout (was previously just 'open design question: host-level vs per-tool'). Captures the default allowlist baseline (package ecosystems + model providers), the SDK-egress middle ground (sentry/stripe/supabase), and the per-tool scoping layer above the project-wide allowlist. Also adds a new TODO entry for cross-platform smoke tests — Windows and macOS binaries ship every release but only Linux is exercised. Surfaced when answering 'are you planning a Windows build?' on the same thread and honestly couldn't claim the binaries are tested.	2026-05-27 18:56:59 +02:00
vikingowl	7213a1e2fd	docs: switch recommended SLM from reecdev/tiny3.5:500m to qwen3:0.6b Release / release (push) Has been cancelled Details Empirical comparison on 2026-05-25 across three candidate SLMs on identical prompts (two prompts: trivial 'what is 2+2' + knowledge 'explain a multi-armed bandit'): qwen3:0.6b consistent across both prompts functiongemma:270m works trivial, derails on knowledge prompts gemma3:1b unusable (emits just '{' or invented keys) reecdev/tiny3.5:1.5b unusable (ignores /no_think, leaks <Thought Process> blocks) qwen2.5-coder:1.5b unusable (ignores classifier prompt, answers in prose) qwen3:0.6b honours Qwen3's native /no_think flag (the distillation in the old default did not), is smaller than the previous recommendation (520 MB vs 1 GB), and was the only candidate to classify both test prompts successfully without falling back to heuristic. README quickstart block + slm-backends.md presets + status output sample all switched. Also documents register_as_arm (default true, set false for task-specialised models like FunctionGemma) and classify_timeout (default 15s) in the example configs since both landed in v0.3.3+. Code defaults for the tiny3.5 family in internal/router/defaults.go are unchanged — that table still applies when users have tiny3.5 registered as a routing arm independent of the SLM role. v0.3.4	2026-05-25 02:43:11 +02:00
vikingowl	fd327107df	fix(router/discovery): always probe ollama capabilities, cache is optional DiscoverOllama() interpreted a nil probeCache as 'skip probing entirely' rather than 'probe but don't cache.' cmd/gnoma/main.go's synchronous discovery path passes nil, so every ollama-discovered model got SupportsTools=false (the Go zero value), regardless of what ollama actually reported in its capabilities field. The symptom: filterFeasible rejected every ollama arm for any tool-requiring task with reason=tools_required_but_unsupported, even when ollama itself reported the model as tool-capable. Verified via curl: qwen3:14b advertises capabilities=[completion, tools, thinking] and has 'tools' in its template, but the gnoma arm shipped with tool_use_capability=false. Fix: always run probeOllamaModel; treat probeCache as an optional memoisation aid only. nil cache now means 'no caching across calls' not 'no probing.' For users with many models, passing a real cache still avoids redundant HTTP calls — semantics for that path are unchanged. Surfaced via the new filterFeasible Debug logging from the previous commit, which made the per-arm rejection reasons visible.	2026-05-25 02:28:05 +02:00
vikingowl	0d3d190a8b	fix(slm,session,router): classifier-only SLMs + session error recovery + feasibility diagnostics Three coupled fixes that surfaced from a single FunctionGemma test session where the SLM-as-execution-arm assumption broke down and every subsequent prompt failed with 'session not idle (state: error)'. (A) [slm].register_as_arm config. The SLM has always been unconditionally registered as both classifier AND tier-0 execution arm. Fine for general-purpose models (ministral, qwen3-chat); breaks for task-specialised models (FunctionGemma emits function-call syntax instead of prose; embedding models can't generate). New pointer-bool config: nil/absent preserves the historical default (true), explicit false makes the SLM classifier-only and the execution path skips the slm/* arm. Three table tests cover absent / explicit-false / explicit-true decode paths. (B) Session error recovery. After any routing or engine error, the session moved to StateError and stayed there until restart — every new user prompt got rejected with 'session not idle (state: error)'. ResetError() was already wired for the /init retry path, but the general user-input and slash-command paths didn't call it. Added ResetError() before every user-initiated Send in the TUI so a fresh prompt always represents intent-to-retry. The /init internal retry already had its own ResetError; left alone. (C) filterFeasible per-arm rejection logging. Today's 'no feasible arm for task X' error tells you THAT every arm was rejected but nothing about WHY. Added slog.Debug per rejection (arm, task, complexity, reason, the specific violated constraint) plus a summary line when zero arms are feasible at any quality. Visible with --verbose; quiet otherwise. Surface area expansion only — no behaviour change for users not chasing a bug.	2026-05-25 01:57:16 +02:00
vikingowl	c065a2dea7	fix(provider/openai): wire ResponseFormat into OpenAI request params provider.Request.ResponseFormat was being silently dropped by the openai translation layer (translate.go:translateRequest). The upstream provider type and the openai-go SDK both supported it; the adapter just never propagated it. This is why Move 1 (set ResponseFormat=ResponseJSON in the SLM classifier) produced zero observable change: the field made it from the classifier into provider.Request but stopped at the OpenAI translation step. The ollama backend (used via the OpenAI-compatible endpoint) therefore never received format=json_object and kept emitting free-form prose, which the classifier's downstream JSON parser duly rejected — 50 fallbacks in a row across two model swaps. Translate provider.ResponseJSON to oai.ResponseFormatJSONObjectParam and provider.ResponseText to oai.ResponseFormatTextParam; leave the union zero-valued when the caller didn't set ResponseFormat so the SDK omits the field per its omitzero tag. Three table cases cover the json / text / unset paths. Affects ollama, llama.cpp, llamafile, and any other backend reached via openaicompat — all run through openai.translateRequest.	2026-05-25 01:26:38 +02:00
vikingowl	24945b1eb2	docs(plans): encoder + contextual-bandit router architecture Captures the architectural research surfaced during the 2026-05-25 SLM-failure diagnostic session: RouteLLM treats routing as classification, ModernBERT is well-suited to that classification, and FunctionGemma fits as an optional JSON-sanity layer rather than the primary classifier. The current decoder-SLM-as-classifier design is the wrong shape (100% failure rate observed across two model swaps). Five-phase plan: 1. Embedding feature scaffold (near-term, additive, opt-in) 2. Contextual bandit (LinUCB / Thompson) over the feature set 3. Retire the decoder-SLM classifier once 2 outperforms 4. ModernBERT fine-tune on the accumulated labelled data 5. FunctionGemma JSON sanity layer (optional final stage) Phase 1 is the only piece scoped for near-term implementation; the rest is multi-month and hinges on the strategic 'EMA vs SLM' question already tracked in TODO. Cross-references the existing tool-router-specialization plan so a reader of either lands on both. Updates the TODO entry for the bandit selector to note the supersession path.	2026-05-25 01:22:18 +02:00
vikingowl	c0c2e4bff5	fix(slm): enforce JSON output + strip thinking-block prefixes Two structural fixes for the SLM classifier's 100% failure rate: (1) Pass ResponseFormat=json_object + Temperature=0 + TopP=1 + MaxTokens=128 in the classifier Request. The provider type already supports these but callSLM was leaving them unset, which meant ollama (and any other backend) ran with default sampling and free-form text output. format=json mode in particular makes ollama emit only valid JSON at decoding time — eliminates the majority of parse failures. (2) Harden extractJSON to strip common thinking-block tags before hunting for the brace. Seen in the wild: <think>…</think> (Qwen3 distillations) and <Thought Process>…</Thought Process> (tiny3.5). Defensive list also covers <reasoning>, <thoughts>. Unterminated thinking blocks fall back to brace-search so we still have a shot. Table-driven tests cover all variants plus the no-tag and fenced-json paths to confirm no regression. Even with format=json on a capable provider, the extractor is the safety net for backends that don't enforce format strictly — same defence-in-depth shape as the existing fence stripping. Doesn't fix the deeper architecture question (encoder + bandit preferred over decoder-SLM as classifier — see plan doc landing in the same PR); fixes the immediate bug.	2026-05-25 01:19:51 +02:00
vikingowl	f3c70bd802	fix(slm,router): honest classifier diagnostics + 15s default timeout Five fixes folded into one commit because they all answer the same question: 'why does my router stats output lie to me?' Issue 1 (timeout). Default classify timeout was 5s — too short for cold-start ollama loads on small models. Bumped to 15s and surfaced as [slm].classify_timeout (0 = built-in default). Empirically caught when a user's reecdev/tiny3.5:1.5b hit 'stream error: context deadline exceeded' on every single classify call. Issue 2 (Warn-level error). The SLM-fallback path logged the underlying error at Debug, invisible without --verbose. Promoted to Warn so a first-time misconfiguration surfaces immediately. The fallback itself is benign; the signal is that the SLM isn't doing the work it was supposed to. Issue 3 (stats hint). Hard-coded 'check that llamafile boots' even when the user is on ollama. Replaced with backend-templated advice read from cfg.SLM.Backend. Also distinguishes three diagnostic cases that were collapsed before: - SLM never called (zero attempts) - SLM called N times but every call fell back (timeout/parse) - SLM working but minority share Issue 4 (effective heuristic share). The classifier breakdown shows 'heuristic' and 'slm_fallback' as separate sources, but both routed through HeuristicClassifier — only the source tag differs. New line under 'total observations' surfaces the combined share honestly: 'effective heuristic share: 100% (44 fallbacks + 10 pure heuristic)'. Issue 5 (config schema). [slm].classify_timeout joins the existing [slm] knobs alongside startup_timeout. Documented inline with the cold-start-load rationale.	2026-05-25 01:05:57 +02:00
vikingowl	fa65a68728	docs(plans): config-migration and sensitive-content-policy Release / release (push) Has been cancelled Details Promotes two TODO entries into phased plan docs and links them from the TODO bullets. config-migration plan covers the silent layered-config corruption chain (encoder zero-spam -> reader overwrite -> wrong effective values) and its remediation across five phases: encoder fix (omitempty + pointer-numeric hybrid), project registry, gnoma doctor, gnoma upgrade-config, and auto-migration on startup with banner notice. sensitive-content-policy plan unifies three input paths (pasted text, pasted images, tool-read files) behind one decision API with consistent UI surface and audit-log integration. Phases A-E sequence the work from highest-leverage (text paste) to most complex (image OCR with local vision arm). Neither plan starts implementation in this commit — they exist to make the design decisions explicit so the eventual code can be reviewed against a written intent rather than a TODO bullet. v0.3.3	2026-05-24 22:51:33 +02:00
vikingowl	8b9bdc2978	feat(security): per-session firewall audit log New AuditLogger writes one JSON line per firewall action to <projectRoot>/.gnoma/sessions/<sessionID>/audit.jsonl so a user can grep 'what did the firewall do this session?' after the fact. Records 'block', 'redact', 'warn', and 'unicode_sanitize' events with the matcher name, source (tool_result / message_text / etc.), and token length. Discipline: never the bytes themselves — only the matcher name and the length, matching the README's scope-note promise about audit data. Plumbing: - Firewall gains an audit *AuditLogger field plus SetAudit setter. The firewall is constructed before the session ID exists, so the audit logger is wired post-hoc once main.go has the sessionID. - Honours incognito: Record is a silent no-op when the firewall's IncognitoMode is active, preserving the no-persistence contract. - Tolerant of fs errors: mkdir / open / encode failures log a Warn but never propagate; the scan pipeline must not depend on audit succeeding. - Nil receiver is a valid no-op so callers don't need nil-guards around every Record. Tracks 'Security boundary — per-session audit log' from the v0.3.0 r/SideProject launch thread (u/Secret_Theme3192, 2026-05-24). Per-host egress allowlist remains separately tracked pending the commenter's reply on host-level vs per-tool semantics.	2026-05-24 22:47:28 +02:00
vikingowl	eea26a262e	feat(router): surface bandit knobs as [router.bandit] config Four hardcoded constants in the selector and feedback tracker are now user-tunable via [router.bandit]: - quality_alpha (EMA smoothing, default 0.3) - min_observations (samples before observed overrides heuristic, default 3) - observed_weight (observed/heuristic blend ratio, default 0.7) - strength_bonus (quality bonus for Strengths-tagged arms, default 0.15) Each field treats 0 as 'use default', so an empty TOML block is byte-identical to pre-config behaviour. BanditParams is plumbed via router.Config{Bandit: ...} and resolveBanditParams() centralises the fallback so every call site shares the same defaults. QualityTracker, scoreArm, bestScored, and selectBest signatures now take the configured values directly rather than reaching for package- level constants. Tests updated to pass BanditParams{} (defaults) or explicit overrides where they validate the new tuning paths. Tracks item #3 from the 'Bandit selector — design decisions deferred' TODO entry — ships independently of the EMA vs SLM strategic decision.	2026-05-24 22:42:34 +02:00
vikingowl	352cab4a94	docs(todo): extend config-migration plan with project registry Release / release (push) Has been cancelled Details Adds item #5 to the config write/merge corruption entry: ~/.config/gnoma/projects.json tracking which directories gnoma has been launched in. Enables doctor --all-projects, cross-project session listing, and one-shot upgrade-config across all known projects. Documents the design constraints: must use the same omitempty / atomic-write discipline as the encoder fix to avoid recreating the class of bug it exists to help solve. Privacy footprint flagged (local-only directory log; opt-out toggle). Stale-entry handling gated through doctor, not auto-prune. v0.3.2	2026-05-24 22:29:56 +02:00
vikingowl	58f4001917	docs(todo): track config write/merge corruption + doctor/upgrade design setConfig() serializes the entire Config struct on every key change, which writes zero-valued fields into the file. On the next load those explicit zeros override higher-priority layers via toml.Decode's present-beats-absent semantics. Concrete symptom today: a global prefer = 'cloud' was silently shadowed by a project prefer = ''. Captures the multi-part fix surface so it doesn't get half-done: - Stop generating zero-spam (omitempty hybrid or pelletier swap). - gnoma doctor: read-only diagnostic (zero-spam, invalid enums, removed keys, effective-merged values). - gnoma upgrade-config: active migration with .bak backup + diff. - Auto-migrate project-level on startup with TUI banner notice; global stays explicit.	2026-05-24 22:24:59 +02:00
vikingowl	6c5e969217	feat(tui): add /router command for runtime routing-preference switch Mirrors the pattern of /permission: bare command shows the current value plus a help line; with an argument (auto/local/cloud) it calls Router.SetPreferPolicy and emits a system message. Session-only — does not write back to config.toml, matching /permission and Ctrl+X incognito-toggle conventions. Tab completion on the value via routerPreferModes alongside the existing permissionModes pattern. Help text updated. Status-bar indicator deferred (separate concern if it turns out to be wanted).	2026-05-24 22:13:27 +02:00
vikingowl	74bd570438	fix(tui): de-dupe /init in command picker; skill names shadow builtins /init appeared twice in the completion picker — once from the static builtinCommands list and once from the bundled init skill at internal/skill/skills/init.md (registered via skills.All()). Two changes: - Remove /init from builtinCommands. The skill provides the canonical entry, and its description ('Generate or update AGENTS.md project documentation') is more accurate than the static one ('initialize project — create AGENTS.md') because the skill handles both create and update. - Refactor completionSource() so a skill name silently shadows any builtin with the same name. Prevents this from recurring if a future builtin migrates to a skill, and lets users override a builtin's description by dropping a skill of the same name into .gnoma/skills/.	2026-05-24 22:08:46 +02:00
vikingowl	d38d7daf25	fix(subprocess/agy): disable ToolUse until stream-json lands agy is registered with FormatAgyText and the agyParser emits every stdout line as a plain EventTextDelta. There is no path for a structured ToolCall event to come back. With ToolUse=true the router would dispatch tool-needing tasks (security_review, spawn_elfs, file edit) to agy; the underlying Gemini model would describe calling the tool in prose — invented UUIDs and 'I will pause now'-style stubs — the engine would receive only text, and the turn would hang waiting for a tool call that never arrives. Surfaced when /init routed to agy for a security_review task and elf spawning visibly hallucinated in the TUI. Capability flag flipped to false; agy stays usable for tool-free prompts (explain, summarize, simple chat). TODO entry for native stream-json updated to flag that the capability flip is part of that same change.	2026-05-24 21:58:22 +02:00
vikingowl	06d4069076	ci: pin GoReleaser to the triggering tag, fix tag-collision regression Release / release (push) Has been cancelled Details When v0.3.1 was tagged on the same commit as v0.3.1-rc2, the release workflow built and tried to publish rc2 artifacts instead of v0.3.1, failing with 'already_exists' on every asset upload. Root cause: goreleaser-action@v6 + 'version: latest' (locked to v2.x) falls back to 'git describe --tags' for the current tag, which picked v0.3.1-rc2 over v0.3.1 when both refs pointed at HEAD. Explicitly setting GORELEASER_CURRENT_TAG = github.ref_name forces the workflow to use the tag that triggered it, regardless of other refs at the same commit. v0.3.1	2026-05-24 17:36:01 +02:00
vikingowl	f641bd4971	docs(todo): track bandit selector design questions Two related items surfaced from the r/coolgithubprojects v0.3.1 launch thread. Bundled because they share the selector code: 1. Whether to keep numeric EMA at all post-SLM dispatcher (open strategic question from the 2026-05-07 roadmap — not a must-implement). 2. Surfacing hardcoded selector knobs (qualityAlpha, blend ratio, strength bonus, quality floor) as [router.bandit] config keys — ships independently of #1.	2026-05-24 17:34:13 +02:00
vikingowl	798f2ab3c3	fix(release): prerelease auto-detect; changelog excludes scoped conventional commits Release / release (push) Has been cancelled Details Two polish issues surfaced by the v0.3.1-rc1 pipeline test: - The release was tagged v0.3.1-rc1 but published without the prerelease flag, so it appeared alongside stable releases. Add 'prerelease: auto' to release.github so GoReleaser marks any tag with a semver prerelease suffix (-rc, -beta, -alpha, -pre) appropriately. - The changelog filters used '^docs:' patterns that only match bare conventional commits. Scoped variants like 'docs(readme):' and 'chore(make):' slipped through into the published changelog. Switch to '^docs[:(]' style patterns to match both forms, and add '^style[:(]' so gofmt-drift commits are excluded too. v0.3.1-rc2	2026-05-24 17:05:49 +02:00
vikingowl	9814795b3c	ci: migrate release pipeline from Woodpecker to GitHub Actions Release / release (push) Has been cancelled Details Drop the broken .woodpecker/release.yml (top-level when: triggered an 'error' status on every dev push instead of skipping non-tag events) and replace with .github/workflows/release.yml driving the same GoReleaser flow. Rationale: - Release artifacts already land on GitHub (releases + ghcr.io), so running the pipeline on GitHub eliminates a build hop. - GH Actions auto-provides GITHUB_TOKEN with packages:write via the workflow permissions block — no PAT plumbing or login secrets. - docker/setup-qemu-action and docker/setup-buildx-action handle the multi-arch cross-build setup that Woodpecker would require manual host configuration for. Trigger: any tag matching refs/tags/v*. Mirror sync from somegit.dev propagates tags to GitHub, so 'git push origin v0.3.1' on the canonical remote still drives the GitHub-side release. v0.3.1-rc1	2026-05-24 16:45:17 +02:00
vikingowl	047924da2b	ci(woodpecker): release pipeline on vX.Y.Z tag Runs 'go test ./...' then 'goreleaser release --clean' inside the official goreleaser image when a tag matching refs/tags/v* is pushed. GITHUB_TOKEN comes from the 'github_token' repo secret (needs repo + write:packages scopes) and is reused for ghcr.io docker login so the multi-arch image build can push. Runner requirements documented inline: docker socket access plus QEMU registered on the host (tonistiigi/binfmt --install all) for arm64 cross-builds. Directory form chosen so a non-release CI pipeline can land later under .woodpecker/ci.yml without restructuring.	2026-05-24 16:38:24 +02:00
vikingowl	a23eb6b92c	style: gofmt drift from prior commits Pure whitespace cleanup surfaced when 'make check' ran gofmt over the tree. Mostly struct-field column alignment in internal/safety/banner.go (SessionInfo) and the var(...) flag block in cmd/gnoma/main.go after --dangerously-allow-anywhere was added without realignment. Verified zero substantive changes via 'git diff --ignore-all-space --ignore-blank-lines'.	2026-05-24 16:33:17 +02:00
vikingowl	0981fb82d6	chore(make): add govulncheck and semgrep to 'make check' Both checks already passed locally on the current dev tip; wiring them into the canonical pre-commit gate so security regressions fail fast instead of leaking into a release. - 'make vuln' runs govulncheck with reachability analysis against the Go vuln DB. - 'make sec' runs semgrep with p/golang + p/security-audit, metrics off, --error so findings exit non-zero. Tools must be installed locally (commands in Makefile comments). If upstream Woodpecker CI runs 'make check', it will need both binaries on the runner image.	2026-05-24 16:30:54 +02:00
vikingowl	3888966e68	fix(deps): bump golang.org/x/net to v0.55.0 to clear reachable CVEs govulncheck flagged two reachable vulnerabilities in golang.org/x/net@v0.52.0: - GO-2026-5026 (idna fails to reject ASCII-only Punycode labels), reached via router.DiscoverOllama -> http.Client.Do -> idna.ToASCII. - GO-2026-4918 (HTTP/2 transport infinite loop on bad SETTINGS_MAX_FRAME_SIZE), same call path -> http2.Transport.*. Bumping to v0.55.0 covers both. Transitive bumps to x/crypto v0.51.0, x/sys v0.45.0, x/text v0.37.0. Post-bump govulncheck reports 0 reachable vulnerabilities and 0 in directly imported packages.	2026-05-24 16:27:28 +02:00
vikingowl	847cd5fe0c	fix(security): use crypto/rand for session-ID suffix Semgrep flagged math/rand for the /tmp artifact-directory session-ID generation. Modern Go (1.20+) auto-seeds the global math/rand source so this wasn't exploitable in practice, but crypto/rand is the idiomatic choice for any security-adjacent identifier and removes the finding from future security audits. Drops the mrand alias entirely; reads 8 random bytes once and masks to 24 bits to preserve the existing %06x suffix format.	2026-05-24 16:22:50 +02:00
vikingowl	001865f069	fix(env): correct ANTHROPIC_API_KEY typo, add missing vars The placeholder ANTHROPICS_API_KEY (with trailing S) silently failed: the auth layer reads ANTHROPIC_API_KEY, so anyone copying .env.example to .env and pasting their key would see gnoma never pick it up, with no clear error. Also surfaces vars that already work but weren't templated: GOOGLE_API_KEY (alternative to GEMINI_API_KEY), GNOMA_PROVIDER and GNOMA_MODEL (config overrides), and the two subprocess sandbox bypass footguns (GNOMA_AGY_BYPASS_PERMISSIONS, GNOMA_CODEX_BYPASS_SANDBOX), left commented out so they don't accidentally turn on.	2026-05-24 16:16:39 +02:00
vikingowl	c1c52f139d	docs(readme): add 'no phone-home' bullet and data-flow scope note Clarify that gnoma itself emits no telemetry to external services while being explicit that cloud-provider arms send data to those providers by design. Adds: - 'No phone-home' bullet to the differentiator list, naming the on-device path (Ollama/llama.cpp + --incognito). - 'Data flow' paragraph to the Security scope-note blockquote so the framing is consistent between the hero bullets and the Security section.	2026-05-24 16:00:40 +02:00
vikingowl	7040041f13	docs(readme): correct firewall scope; track egress controls in TODO The 'What makes gnoma different' bullet and Security section both implied a network-egress firewall. Today the Firewall only enforces a content boundary (secret scan, Unicode sanitize, redact/block). Reword both spots and add a Scope note. Surface the gap as a top-of-TODO entry covering per-session audit log and per-host egress allowlist, with the open design question (host-level vs per-tool) called out. Raised via r/SideProject v0.3.0 launch thread.	2026-05-24 15:50:35 +02:00
vikingowl	1828151162	docs(claude): big-picture architecture and expanded test commands Add a 'Big picture' section summarising the request flow (cmd → session → engine → router → security/permission → extensibility) so future Claude Code instances can orient without reading INDEX.md plus five package directories first. Note that internal/safety and internal/slm aren't in INDEX.md yet. Document the somegit.dev / GitHub mirror split and the ruleset that blocks force-push and deletion on main/dev. Expand build/test section with make check, make test-integration, single-test, and benchmark commands.	2026-05-24 15:39:23 +02:00
vikingowl	b5062d59e9	docs(readme): hero screenshot, differentiators, status, TOC Add docs/img/gnoma-tui.png as a hero image so visitors see the TUI above the fold instead of a wall of text. Pull the bandit router, prefer-policy, SLM, and built-in firewall out of buried sections into a 'What makes gnoma different' bullet list. Add a Status block flagging pre-1.0 and a table of contents. Move the pygmy-owl naming note and upstream/mirror URLs into a footer About section.	2026-05-24 15:39:14 +02:00
vikingowl	b13a6a2801	docs(plans): mark v0.3.0 plans shipped Three plans shipped end-to-end in v0.3.0; removing them from TODO.md In-flight and adding a Status: shipped header to each plan doc with the commit references. Shipped: - 2026-05-23-routing-defaults-refresh.md - 2026-05-23-prefer-routing-policy.md - 2026-05-23-startup-safety-banner.md Still in flight (telemetry-gated, fires only if measurements support it): - 2026-05-23-tool-router-specialization.md	2026-05-23 22:45:05 +02:00
vikingowl	8ba77c1685	fix(safety): env-template precision, label alignment, banner on bypass Three polish items surfaced during the maintainer's manual smoke of the previous safety commit. env-template precision (false-positive fix): The "env file" rule matched .env.* universally, which flagged conventional templates like .env.example / .env.sample / .env.template / .env.dist / .env.default — these hold variable NAMES, no values, and are commonly committed. Now skipped. Real env files (.env, .env.local, .env.production) still match. New envTemplateSuffixes table + isEnvTemplate helper; check runs only inside the env-file rule so the suffix denylist is scoped. Tests added for both directions: 6 templates that must NOT flag, 6 real env files that must. Banner label alignment: Field labels were padded to 8 chars except "sensitive" at 9, producing visible misalignment in the rendered banner: cwd : /... provider : ollama / ... sensitive : 0 matches in cwd <- one extra space Padded all labels to 9 chars so the ":" separators line up. Context banner on bypass: --dangerously-allow-anywhere previously suppressed the entire safety block, including the informational context banner. Bypassing the GATE is not the same as opting out of the info — the user still wants to see cwd / git state / sensitive files nearby. Restructured the safety block so classification + banner always run; the bypass only skips the refuse/warn FLOW. The bypass warning log now also includes the classified tier and cwd path for diagnostics. v0.3.0	2026-05-23 22:32:26 +02:00
vikingowl	c483656681	docs(plans): fix gnoma one-shot invocation in safety-banner plan gnoma takes the prompt as a positional argument, not via -p (that's Claude Code's syntax). Surfaced when the maintainer tried the manual smoke from the plan's "Definition of done" section and hit the "flag provided but not defined: -p" error. before: gnoma -p "test" after: gnoma "test" The same wrong syntax appears in the `f9094f6` / `3eeb5b4` commit messages but those are immutable. This commit also serves as the public record of the typo so future readers don't repeat it.	2026-05-23 22:26:56 +02:00
vikingowl	d206b3cf09	docs: routing-prefer + startup-safety user docs, plan tier-shift note README: - New "Preferring local vs cloud" subsection under "Routing defaults" — table of the three [router].prefer values, priority order against forced arm / incognito / Strengths, and the CLI-agent-counts-as-local clarification. - New "Startup safety check" subsection under "Security" — tier table, [safety] config block, --dangerously-allow-anywhere flag, container detection note, link to the plan doc. Plan doc (prefer-routing-policy): - Approach section updated to describe the tier-shift mechanism that actually shipped, with a clear "Implementation note" explaining why the original score-multiplier approach was abandoned (cost-floor math gives local arms a ~280x raw-score advantage that any reasonable multiplier can't overcome). - CLI-agent placement flipped from "non-local" to "local" with rationale — implementation chose user-facing behavior axis over the privacy axis the original draft used. - Tier-shift rationale table replacing the multiplier rationale. - P-3 task rewritten to reflect the actual implementation (checked off and pointing at the right code), with the policyMultiplier helper noted as a within-tier nudge of limited present effect. The implementation-vs-plan deviation is now documented in both the plan doc and the original feature commit message (`f9094f6`). Future readers reach the same understanding via either path.	2026-05-23 22:23:57 +02:00
vikingowl	3eeb5b46d7	feat(safety): pre-launch cwd classifier + context banner Implements S-1 through S-7 of the startup-safety-banner plan. Adds a pre-launch safety check that classifies the current working directory into three tiers and gates the launch: TierRefuse /, /etc, /sys, /proc, /usr, /var, /bin, /sbin, /boot, /root, /dev (Linux) and /System, /Library, /private, /Applications (macOS). Refuses with exit 2 unless --dangerously-allow-anywhere is passed. TierWarn $HOME, ~/Desktop, ~/Downloads, ~/Documents, ~/.config, ~/.local, ~/.cache, /tmp, and similar dumping grounds. Prints a banner and reads a single y/Y from stdin to confirm; any other input (or EOF, including piped/ scripted invocation) aborts with exit 1. TierOK Anywhere with a recognized project marker (.gnoma/, go.mod, package.json, pyproject.toml, Cargo.toml, Makefile, Dockerfile, build.gradle, pom.xml) or inside a git repo. No prompt; banner only. Project markers and git-repo presence override the TierWarn check — a project dir inside $HOME stays TierOK. The require_project_marker config knob can flip that for strict users. Container detection: when /.dockerenv or /run/.containerenv exists, TierRefuse downgrades to TierWarn (devcontainers often chroot to / or similar). Best-effort; false positives only soften the gate. The context banner is always rendered (TierOK, TierWarn, TierRefuse alike) and summarizes: cwd, git branch + dirty state, project type, provider/model, modes (permission, incognito, prefer), and a top-level sensitive-file inventory. Inventory matches .env, .env., env.local; private-key extensions (.pem, .key, .crt, .p12, .pfx); SSH key names (id_rsa, id_ed25519, ...); credentials files; .netrc / .pgpass; KeePass vaults; and .ssh/ .aws/ .kube/ .gcloud/ .azure/ .docker/ directories. Precision-tested: .envrc and secret_handler.go do NOT match. Bounded at 1000 entries. Architecture: - internal/safety/cwd.go — Classification + symlink-resolving tier classifier with platform-specific roots and container detection. - internal/safety/sensitive.go — pattern-based top-level scanner, deterministic ordering, scanLimit guard against pathological dirs. - internal/safety/banner.go — pure render functions for the warn prefix, refuse message, and context banner. Safe for golden-string testing. - internal/config/config.go — new [safety] section with three config keys, defaults applied via ResolvedSafety() helper. Pointer fields distinguish "user omitted" from "user set to false." - cmd/gnoma/main.go — gate runs after subcommand dispatch (so `gnoma providers / profile / slm / router` skip the prompt) and before provider creation. --dangerously-allow-anywhere bypasses the gate with an explicit log warning. The runtime keypress reads up to 8 bytes from os.Stdin and accepts only "y" / "Y" trimmed; EOF returns false (piped invocations without the flag will abort). Documented in the readYesConfirmation helper. Manual smoke (per plan): - `cd / && gnoma -p test` → refuses - `cd ~ && gnoma` → warns + keypress - `cd ~/git/some-repo && gnoma` → banner only - subcommands skip the gate entirely Linux + macOS classification; Windows path handling deferred per plan (treated as TierOK there until follow-up). Refs: docs/superpowers/plans/2026-05-23-startup-safety-banner.md	2026-05-23 22:19:39 +02:00
vikingowl	f9094f68f3	feat(router): [router].prefer = local \| cloud \| auto Implements P-1 through P-6 of the prefer-routing-policy plan. Adds a config knob that biases routing toward local arms, cloud arms, or leaves selection unchanged. Default "auto" is byte-identical to pre-change behavior (the new armTier path with PreferAuto returns the same value as the old single-arg function). Mechanism diverged from the plan after empirical testing: The plan called for a score multiplier applied in bestScored. Tests revealed the existing cost-floor math (scoreArm divides by weighted cost which collapses to ~0.001 for free local arms) gives local arms a ~280x raw-score advantage that a 0.3-0.5 multiplier can't overcome. A tier-shift in armTier turned out cleaner: PreferLocal: cloud arms (true API, IsLocal=false && !IsCLIAgent) get +2 tier shift, landing behind locals. PreferCloud: IsLocal arms get +2 tier shift, landing behind cloud. SLM tier-0 arms shift to tier 2 — still below cloud's tier 3 — so the SLM-protection semantic (small stuff stays on the small model) survives PreferCloud. This matches the open question in the plan, now resolved as: yes, SLMs keep winning under PreferCloud by design. The policyMultiplier was kept in bestScored as a within-tier nudge (mostly cosmetic in practice given the cost-floor dynamics described above; could matter when costs are calibrated). Worth revisiting once router-wide cost calibration lands. Strengths cross-tier promotion is unaffected: the promoted-set path in selectBest bypasses armTier entirely, so a strongly-tagged cloud arm still wins SecurityReview tasks under PreferLocal (validated by TestPreferPolicy_StrengthsBeatsMultiplier). CLI-agent subprocess arms count as "local" for PreferLocal purposes — they proxy to cloud but the user-visible behavior is local. Users who want to exclude them can use --provider X. Forced arms (--provider X) and incognito take priority over the policy: forced arm test pins this, incognito-still-wins test pins the LocalOnly hard filter dominating PreferCloud. Test coverage (prefer_test.go): ParsePreferPolicy / String round trips; policyMultiplier table; acceptance scenarios across all three policies with adjacent-tier arms; SLM-still-wins under PreferCloud; Strengths beats multiplier; forced-arm bypass; incognito beats prefer; lone cloud arm wins when no local feasible. Refs: docs/superpowers/plans/2026-05-23-prefer-routing-policy.md	2026-05-23 22:13:26 +02:00
vikingowl	162c8b1017	docs(plans): prefer-routing-policy and startup-safety-banner Two parallel pre-flight plans surfaced in the 2026-05-23 session, both deferred while the routing-defaults-refresh implementation landed. Drafted as separate plans because they're independent: the prefer-policy is a router scoring change; the safety banner is a launch-time check that never touches the router. prefer-routing-policy [router].prefer = "local" \| "cloud" \| "auto" — soft score multiplier (0.3 / 0.5 / 1.0) biasing toward local or cloud arms while preserving Strengths cross-tier promotion and bandit learning. Default "auto" is byte-identical to current behavior. Forced arms and incognito retain priority. CLI-agent subprocess arms count as non-local for this knob (they proxy to cloud). startup-safety-banner Three-tier cwd classification at launch — refuse in /etc /sys and other system roots; warn+keypress in $HOME, /tmp, ~/Desktop, ~/Downloads; OK inside any git repo or directory with a project marker (.gnoma/, go.mod, package.json, etc.). Always shows a context banner with cwd, git state, model, modes, and a top-level sensitive-file inventory (.env, id_rsa, *.pem, .ssh/, etc. — informational only, no recursion, capped at 1000 entries). Bypass via --dangerously-allow-anywhere. Complements the in-flight sensitive-content unified-policy TODO item: this is the pre-flight layer, that is the runtime input-path layer. Both plans default-on with safe defaults; both have explicit out-of-scope sections to prevent scope creep during implementation. Linux + macOS first; Windows path classification deferred. TODO.md surfaces both as in-flight.	2026-05-23 22:00:21 +02:00
vikingowl	c99b2c64ad	docs(readme): document routing defaults table and [[arms]] overrides Closes R-8 of the routing-defaults plan. Adds a new "Routing defaults" section between Config and SLM that documents what arms ship with out-of-the-box — the family-keyed Strengths / MaxComplexity / CostWeight matrix plus the non-chat exclude list. Also introduces the [[arms]] override block in the README for the first time (previously undocumented), showing how users keep priority over the defaults. Links back to the plan doc for the benchmark sources and per-entry rationale.	2026-05-23 21:42:05 +02:00
vikingowl	2f8d4c412f	feat(router): cloud-arm defaults, gpt-5.3-codex registration Closes R-4 and R-5 of the routing-defaults plan. R-4: Strengths + CostWeight defaults for closed frontier models. Cloud entries land in the same knownFamilyDefaults table as local ones, with MaxComplexity intentionally left zero (cloud arms get no complexity ceiling). CostWeight tuned per the plan's rationale: claude-opus-4-7 → Planning/SecurityReview/Debug/Refactor, 0.3 claude-sonnet-4-6 → Generation/Refactor/Review, 0.7 gpt-5.5 → Planning/SecurityReview/Generation, 0.3 gpt-5.3-codex → Generation/Refactor/Debug/UnitTest, 0.6 gpt-5.2 → Orchestration/Review, 0.8 gemini-3.1-pro → Planning/Review/Orchestration, 0.5 gemini-3.5-flash → Boilerplate/Explain/Orchestration, 1.2 The 0.3 weight on frontier arms keeps them competitive on SecurityReview / Planning despite $4+/Mtok; 1.2 on Gemini Flash penalizes cost more so it only wins when cost is genuinely decisive (boilerplate, explain). Mechanism: extracted applyFamilyDefaults into defaults.go and call it from Router.RegisterArm. Single source of truth — both local discovery and the primary-provider path in cmd/gnoma/main.go now flow through the same defaults application. Removed the duplicate apply block from RegisterDiscoveredModels. Legacy model IDs (claude-opus-4-20250514, gpt-4o, o3, gemini-2.5-pro, etc.) intentionally do not match any table entry — keeps users on pinned older models safe from imposed 2026 Strengths. R-5: gpt-5.3-codex registration. - internal/provider/openai/provider.go: added to fallbackModels and inferOpenAIModelCapabilities (400K context, 32K output). - internal/provider/ratelimits.go: gpt-5.3-codex and its dated alias gpt-5.3-codex-2026-02-15 added with the same Tier 1 quotas as gpt-5.2. Gemini 3.x (3.1-pro-preview, 3.5-flash, 3.1-flash-lite) was already registered in both google/provider.go and ratelimits.go — no change needed for that part of R-5. Test coverage: - ResolveFamilyDefaults table-driven across all 7 cloud entries including prefix-sharing (gpt-5.5-pro → gpt-5.5 defaults, gemini-3.1-pro-preview → gemini-3.1-pro defaults). - Legacy IDs return !ok. - RegisterArm applies cloud defaults end-to-end. - User-supplied Strengths and CostWeight are not overridden. - ID.Model() fallback works when ModelName is empty (test code often constructs arms this way). Refs: docs/superpowers/plans/2026-05-23-routing-defaults-refresh.md	2026-05-23 21:39:48 +02:00
vikingowl	9bb775a4aa	feat(router): full local family defaults table with size-keyed ceilings Expands the family-defaults scaffold to 23 entries covering the local models that currently appear in real Ollama fleets: coder specialists (qwen3-coder, devstral, qwen2.5-coder, yi-coder, deepseek-coder, starcoder), reasoners (phi-4, phi-4-mini), Gemma 2/3/4 (including the "edge" e2b/e4b variants under both Ollama and GGUF naming), Qwen 2.5/3/3.5 with a catch-all qwen entry, Mistral/Ministral (incl. the 24B mistral-small-3), Llama 3.2/4, tiny3.5 (reec's distill family), Granite, GLM (incl. glm-ocr specialist), and MiniCPM-V. Five families that span wide parameter ranges (qwen3.5, qwen3, qwen2.5, ministral-3, tiny3.5) now use SizeCap ladders instead of a flat MaxComplexity. A new parseSizeFromModelID helper splits the model ID on :/-_/ and matches pure <N>b/<N>m tokens, correctly ignoring qwen3.5 version strings, e2b edge tags, a3b MoE active params, and v0.3 version suffixes. ResolveMaxComplexity wraps ResolveFamilyDefaults plus the SizeCap traversal, falling back to the smallest cap when size parsing fails (conservative). Discovery's apply path now goes through it so SizeCap entries actually take effect. Test coverage: - parseSizeFromModelID (11 cases) - ResolveFamilyDefaults longest-prefix discipline (19 cases) - Unknown-family fallback returns !ok - ResolveMaxComplexity size-keyed ladder (13 cases) - Size-parse-failure fallback - knownFamilyDefaults invariants: SizeCaps ordered largest-first, SizeCaps and MaxComplexity mutually exclusive per entry - Routing-payoff integration: 3 arms (tiny3.5:1.5b, phi-4:14b, qwen3-coder:30b) get picked for TaskGeneration / TaskPlanning / TaskBoilerplate respectively, without any [[arms]] config - Local fleet visibility: the maintainer's actual `ollama ls` inventory registers correctly with expected MaxComplexity and Strengths; embeddinggemma stays filtered out The Planning sub-case surfaced a separate issue worth flagging: heuristicQuality floors out at 0.55 for a generic 14B local model without ThinkingModes, below TaskPlanning's 0.60 threshold. The test mutates phi-4's capabilities post-registration to reflect reality (phi-4 is reasoning-tuned). A discovery-side thinking-capability detection is out of scope for this plan but flagged in the test comment for follow-up. Refs: docs/superpowers/plans/2026-05-23-routing-defaults-refresh.md	2026-05-23 21:34:09 +02:00
vikingowl	a79e99199d	feat(router): non-chat exclude, vision prefixes, family-defaults scaffold Discovery previously registered every model returned by Ollama as a chat arm, including embeddings, ASR, TTS, audio realtime, and rerankers — which then failed at inference time when the router selected them. Local arms also shipped with all-zero defaults, so selection between e.g. tiny3.5:1.5b, phi-4:14b, and qwen3-coder:30b was effectively random. This change covers tasks R-1, R-2, R-6 from the routing-defaults plan. - nonChatModelPatterns + isNonChatModel substring matcher; matched IDs are skipped during RegisterDiscoveredModels. Covers whisper, moonshine, kokoros, vibevoice, -asr, -tts, -audio, -embedding, embeddinggemma, -reranker, lfm2. - knownVisionModelPrefixes gains gemma4, gemma-4, glm-ocr. gemma3 and minicpm-v entries stay for regression coverage. - New internal/router/defaults.go with FamilyDefaults struct, knownFamilyDefaults map, and ResolveFamilyDefaults longest-prefix lookup (with org/-namespace stripping so reecdev/tiny3.5:1.5b resolves to "tiny3.5"). Single entry for now: functiongemma is registered with Disabled=true and MaxComplexity=0.40, reserved for the future ArmRoleToolRouter path. Table will grow in R-3. - RegisterDiscoveredModels consults ResolveFamilyDefaults and only populates fields that are still zero on the arm, so user [[arms]] overrides keep priority. Plans: - docs/superpowers/plans/2026-05-23-routing-defaults-refresh.md - docs/superpowers/plans/2026-05-23-tool-router-specialization.md TODO.md surfaces both as in-flight items.	2026-05-23 21:24:59 +02:00

1 2 3 4 5 ...

266 Commits