f321dabce3845ef6afa2d3608fd9aba75f8a9faa
266 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
f321dabce3 |
feat(config): Phase 3 — gnoma doctor diagnostic command
Phase 3 of the 2026-05-24 config-migration plan. Read-only
diagnostic over config files. Pairs with `gnoma upgrade-config`
from the previous slice: doctor finds things upgrade-config
can't fix, upgrade-config fixes the things it can.
What doctor surfaces (severity-ranked):
error — file unreadable, file unparseable
warn — unknown top-level keys (decoder silently
ignores them today)
— invalid enum values (permission.mode,
router.prefer, slm.backend)
— explicit-zero pointer fields whose resolved
value diverges from the default (e.g.
max_tokens = 0 when default is 8192)
info — (reserved; current diagnostics are warn+)
What doctor does NOT yet surface:
- Per-field zero-spam inside a partially-set section
(e.g. user wrote [provider] default = "anthropic" with
no other fields — those are at Go zero but the
encoder's omitempty handles them on the next write).
Catching this requires per-key source-tracking that
BurntSushi's MetaData doesn't expose for nested
fields; tracked as a follow-up.
- Cross-file layering bugs (e.g. project file's
prefer = "" silently shadows global's prefer = "cloud").
That requires loading the full layered config and
diffing per-section — could be a follow-up to doctor,
or the per-project upgrade-config --all flow.
CLI surface (`cmd/gnoma doctor`):
gnoma doctor scan the project config
(default — cwd's .gnoma/config.toml)
gnoma doctor <path> scan a specific file
gnoma doctor --all-projects walk the registry, scan
global + every known project
gnoma doctor --json structured JSON to stdout
(severity as string, suitable
for CI/scripts)
exit code: 0 = clean, 1 = any warn/error
Help text: `gnoma -h` now lists `doctor` alongside the
other subcommands.
Implementation:
internal/config/doctor.go Severity, Finding, Doctor,
DiagnoseFile, DiagnoseFiles
(~150 lines).
internal/config/doctor_test.go 11 tests covering each
finding type + Severity.String.
cmd/gnoma/doctor_cmd.go CLI dispatch + JSON / text
rendering + exit code.
cmd/gnoma/doctor_cmd_test.go 5 tests for the CLI surface.
internal/config/load.go new ProjectConfigPathFor
helper for --all-projects
(constructs a project config
path from an arbitrary root
without chdir).
cmd/gnoma/main.go dispatch case + -h help text.
Severity.MarshalJSON is custom: encodes the int as its
lower-case name string ("warn" not 1) for stable CI
consumption. Tests assert on the string form.
End-to-end check on a synthetic config with multiple
findings:
$ gnoma doctor
warn ...:permission.mode invalid permission.mode "yes" ...
→ fix the value, or remove the line
warn ...:provider.max_tokens explicit zero for provider.max_tokens
(resolved to 0); the default is 8192. ...
warn ...:unknown_section unknown top-level key "unknown_section" ...
warn ...:unknown_section.foo unknown top-level key "unknown_section.foo" ...
exit: 1
$ gnoma doctor --json
[
{ "severity": "warn", "path": "...",
"key": "permission.mode", "message": "..." },
...
]
Quality pipeline:
gofmt -l . clean
go vet ./... clean
golangci-lint run ... 0 issues on touched packages
go test ./... all pass (only the pre-existing
TestStartBackend_Auto_NothingReachable
environmental failure remains)
Refs: docs/superpowers/plans/2026-05-24-config-migration.md
§ Phase 3.
|
||
|
|
56d7217668 |
feat(config): Phase 2 — project registry at ~/.config/gnoma/projects.json
Phase 2 of the 2026-05-24 config-migration plan. Adds a
per-user list of directories gnoma has been launched in.
Powers `gnoma doctor --all-projects` (Phase 3) and
`gnoma upgrade-config --all` (Phase 4 --all-projects), and
unblocks the cross-project session picker / stats features
called out in the original plan.
Schema (stable for v0.4.x):
{
"projects": [
{
"path": "/home/user/git/foo",
"first_seen": "2026-04-15T10:30:00Z",
"last_seen": "2026-05-24T19:23:00Z",
"session_count": 47
}
]
}
API:
- LoadRegistry() / LoadRegistryAt(path) — read from canonical
path or test-injected path. Missing file → empty registry, no
error. Corrupt file → error (silent zero-ing would let
broken files accumulate stale state).
- (*Registry).Record(projectRoot) — idempotent add/bump,
atomic save. Empty projectRoot is a programmer error.
- (*Registry).Prune(staleBefore time.Duration) — returns the
(sorted) list of pruned paths so callers can surface them
in user-facing output.
- RegistryFilePath() — exposes the canonical path for
inspection and `rm` workflows.
Implementation:
- internal/config/registry.go (~120 lines): the Registry
type with a sync.Mutex guarding Record/Prune. Saves
through the same writeAtomicBytes helper that
upgrade-config uses (temp file + sync + rename), so a
crash mid-write can never leave a half-written registry.
- internal/config/config.go: new SettingsSection type
under `[config]` (the gnoma-level settings home — future
log-level / telemetry flags will live here too). Field:
ProjectRegistry *bool, omitempty. nil = enabled (default
true, preserves v0.3.x behavior); *false = opt out.
- internal/config/resolve.go: ResolvedConfig gains
ProjectRegistry bool (single-bool mirror, no full
ResolvedSettingsSection since there's only one field).
nil → default true, *false → false. Mirrors the existing
nil→true convention used for SLM.RegisterAsArm.
- internal/config/defaults.go: populates
&projectRegistry{true} on Defaults().
- cmd/gnoma/main.go: calls LoadRegistry().Record(ProjectRoot())
right after the safety banner renders, gated on
resolved.ProjectRegistry. Failure is logged at Warn level
but never blocks startup. Also moved the resolved :=
cfg.Resolved() initialization to before the registry call
(was previously at line 346 for the WriteTool setup) so
the registry block can use it.
- README.md: new bullet under §Security explaining the
registry is purely local, never sent off-machine, and how
to opt out via [config].project_registry = false.
Tests (internal/config/registry_test.go, 14 cases):
- LoadRegistryAt: missing file → empty, valid file parses,
corrupt file errors.
- Record: new project adds, existing bumps LastSeen +
SessionCount, empty path errors, atomic-write hygiene
(no .tmp-* left), save→reload round-trip, creates
parent dir on first save.
- Prune: removes stale, keeps fresh, no-op when nothing
stale, reports sorted pruned paths, empty-registry no-op,
persists across reload.
- Plus 2 new resolver tests for the
default-true / explicit-false ProjectRegistry paths.
End-to-end smoke (user's actual environment):
$ cd /tmp/registry-smoke
$ echo "test" | gnoma --provider ollama
$ cat ~/.config/gnoma/projects.json
→ { "projects": [ { "path": "/tmp/registry-smoke", ... } ] }
$ printf '[config]\nproject_registry = false\n' \
> ~/.config/gnoma/config.toml
$ echo "test" | gnoma --provider ollama
$ ls ~/.config/gnoma/projects.json
→ No such file (opt-out working)
Not in this slice:
- gnoma doctor --all-projects (Phase 3): uses the registry
to enumerate projects, runs the per-file diagnostic from
Phase 1 + the upgrade-config cleaner.
- gnoma upgrade-config --all (Phase 4): walks the registry,
calls Upgrade on each.
- Cross-project session picker / stats: foundation is here,
UI work is follow-up.
Refs: docs/superpowers/plans/2026-05-24-config-migration.md
§ Phase 2.
|
||
|
|
da5b19c159 |
docs(help): list config + upgrade-config in gnoma -h
The `gnoma config` (set/keys) and `gnoma upgrade-config` subcommands shipped in |
||
|
|
86ae142dfe |
fix(upgrade-config): friendly "no such file" + add --global flag
First-run UX fix. `gnoma upgrade-config --dry-run` in a directory with no `.gnoma/config.toml` used to error with: error: read config: open .../.gnoma/config.toml: no such file or directory That's a hard error for what's actually a non-event. The cleanest user experience: tell the user there's no project config to upgrade, hint that they can pass an explicit path or use --global, and exit 0. Changes: 1. `cmd/gnoma/upgrade_config_cmd.go::runUpgradeConfigCommand` now stats the target before calling `gnomacfg.Upgrade`. For the implicit project/global targets, a missing file produces a friendly exit-0 message. An explicit path the user typed is still a hard error (caller asked for that specific file, didn't get it). 2. New `--global` flag, symmetric with `gnoma config set --global`. The user-level config is where zero-spam actually accumulates over time (most users never have a project config) so this is the more useful default target in practice. `--global <path>` is rejected as mutually-exclusive. 3. Rewrote the flag-parsing loop to avoid a Go slice-aliasing bug discovered while writing the tests. The original implementation did `pathArgs = append(args[:i], args[i+1:]...)` inside a `for i, a := range args` loop, which aliases the underlying array and overwrites earlier `a` values on subsequent iterations. With `--global --dry-run` the `--dry-run` overwrote `args[0]`, so the second iteration read `--dry-run` as `a` for both the `--dry-run` and `--global` cases. The new code walks `args` once and accumulates into a fresh `pathArgs` slice, no aliasing. Tests added in upgrade_config_cmd_test.go: - TestRunUpgradeConfig_MissingProjectConfigIsFriendly - TestRunUpgradeConfig_MissingGlobalConfigIsFriendly - TestRunUpgradeConfig_GlobalFlagUpgradesGlobalConfig - TestRunUpgradeConfig_GlobalWithExplicitPathIsError End-to-end check on the user's actual environment: $ gnoma upgrade-config --dry-run /home/.../gnoma/.gnoma/config.toml: no such file, nothing to upgrade hint: pass an explicit path, or use --global for the user-level config exit: 0 $ gnoma upgrade-config --global --dry-run /home/.../.config/gnoma/config.toml: already clean, nothing to do (dry run) exit: 0 |
||
|
|
70cd530578 |
feat(config): upgrade-config command + Duration pointer fix
Closes the two follow-up caveats from the 2026-06-04
config-migration follow-up plan:
Caveat 1 — Duration pointer conversion
SLM.StartupTimeout and SLM.ClassifyTimeout are now *Duration
(pointer) instead of bare Duration. nil = "use documented
default" (5s and 0s respectively); *Duration(0) = explicit
zero. ResolvedSLMSection added to the mirror so consumers
read resolved time.Duration values instead of the raw
pointer. cmd/gnoma/main.go, profile_cmd, and the SLM
startup wiring all move through the mirror. The remaining
cosmetic encoder issue (startup_timeout = 0 / classify_timeout
= 0 written even with omitempty) is fixed because the
BurntSushi encoder now sees a nil pointer when the user
didn't set the field.
ResolvedSLMSection's RegisterAsArm mirrors the existing
nil→true default-substitution semantics from the field's
doc comment; the if-nil check in main.go is collapsed to
a direct read of resolved.SLM.RegisterAsArm.
Caveat 2 — `gnoma upgrade-config` (single-file mode)
New command that cleans a config file in place: drops
pointer-converted fields whose resolved value matches the
resolved default, leaves explicit-zero pointer fields
alone (the "explicit zero preserved" contract from Phase 1),
and writes the cleaned form atomically with a
.bak-YYYYMMDD-HHMMSS backup of the original. Idempotent —
a second run on the cleaned file reports "already clean,
nothing to do" without creating a second backup.
Cleaning rules per field type (encoded in internal/config/
upgrade.go::clean):
- pointer-converted fields: null iff resolved value
equals resolved default
- non-pointer string / map / slice / numeric / bool
fields: encoder's omitempty already handles them on
rewrite; the cleaner doesn't touch them
Diff output uses a simple line-by-line algorithm (added/
removed/neutral) via splitLines + a forward scan. Adequate
for the small config files gnoma produces. A proper Myers
diff could be vendored later — pmezard/go-difflib is
already a transitive dep in go.sum.
internal/config/load.go::ProjectConfigPath is now exported
so the CLI can default the upgrade target to the project
config when no path is given.
--dry-run runs the upgrade then restores the file from the
backup so the operation is truly side-effect-free.
Scope notes
Single-file mode only. --all-projects is deferred until the
project registry (Phase 2 of the 2026-05-24 plan) lands —
the follow-up doc calls this out as the natural next slice
and it can be added as a follow-up PR without touching
upgrade-config's core semantics.
No-op test cases (TestUpgrade_NoChangesOnAlreadyCleanFile,
TestUpgrade_KeepsExplicitUserValues, TestUpgrade_Keeps-
ExplicitZeroPointerFields) assert the "resolved view is
identical before and after" contract.
Test coverage
internal/config/upgrade_test.go: 10 tests (drops, keeps,
backup, idempotency, diff, edge cases)
internal/config/resolve_test.go: +3 tests for ResolvedSLM
internal/config/write_test.go: +1 test for the Duration
emission fix
cmd/gnoma/upgrade_config_cmd_test.go: 3 tests for the CLI
Refs: docs/superpowers/plans/2026-06-04-config-migration-followups.md
|
||
|
|
db7a47012e |
docs(config): follow-up plan for Phase 1 caveats (Duration, pre-existing zero-spam, Bandit sentinel)
Captures the three caveats shipped in a9bba42's commit body as a tracked plan: Duration fields still emit as int64, pre-existing zero-spam isn't auto-cleaned, BanditSection keeps the 0-sentinel pattern. Sizes and orders the follow-ups so Phase 2/3/4 of the original config-migration plan stay decomposable into independent PRs. |
||
|
|
a9bba42c3d |
fix(config): stop generating zero-spam on setConfig; add Resolved mirror
The 2026-05-24 silent-corruption symptom: a `gnoma config set provider.default anthropic` call read the existing TOML into a zero-valued Config, set one field, then wrote the entire struct back. Every untouched field was serialized at its Go zero value (`mode = ""`, `max_tokens = 0`, etc.), and on the next layered load those present-but-zero fields silently shadowed higher- priority layers per TOML's "present field wins" semantics. This is Phase 1 of the 2026-05-24 config-migration plan: encoder-side only. Phases 2-5 (registry, doctor, upgrade-config, auto-migration) follow in subsequent slices. The fix is the hybrid approach the plan chose: - `,omitempty` on every string / map / slice field so absent keys aren't re-emitted. - Pointer conversion for the seven fields where the Go zero (`0`, `false`, `0.0`) is a legitimate user choice and the absent-vs-explicit-zero distinction matters: Provider.MaxTokens, Tools.MaxFileSize, Security.EntropyThreshold, Security.RedactHighEntropy, Router.ForceTwoStage, Session.MaxKeep, HookConfig.FailOpen. nil (absent) and *zero (explicit) are now distinguishable; the new Resolved() mirror substitutes Defaults() for nil so consumers see a clean concrete value. - Defaults() populates the new pointer fields with their default values so the resolver substitution is a no-op for the common case of "user didn't set it". - ResolvedConfig + Resolved() follow the ResolvedSafetySection precedent: a separate mirror type, constructed at the end of Load, with the boundary rule "raw cfg.X is internal; readers go through cfg.Resolved().X for pointer-converted fields". - setConfig now uses an atomic temp+rename write (writeAtomicTOML) so a crash mid-write can't leave a half-written config file. CLI surface: `gnoma config set [--global] <key> <value>` and `gnoma config keys` replace the dead help-string reference at cmd/gnoma/main.go:1538. All consumers of pointer-converted fields (cmd/gnoma/main.go, cmd/gnoma/profile_cmd.go, internal/hook/, internal/plugin/) move to the Resolved mirror. Test coverage: 6 resolver tests + 7 write tests + 3 CLI tests in the affected packages. Full go test ./... is green except for a pre-existing llamafile health-check timeout in internal/slm/backend_test.go that's environmental and unrelated to this change. Caveats (carried as follow-up work, not blockers): 1. Duration-typed fields (SLM.StartupTimeout, SLM.ClassifyTimeout) still emit as raw int64 even at zero. BurntSushi's encoder doesn't honor omitempty on the custom Duration type without a MarshalText method, and the existing MarshalText-less Duration type predates this fix. Cosmetic-only: 0 is the documented "use default" sentinel for both fields, so the value is semantically correct. Fix is a separate pointer-conversion PR on those two fields. 2. Pre-existing zero-spam in user config files is not auto-cleaned by a setConfig call on a different key. The user's recovery path remains: re-set the affected key (which the new omitempty + pointer semantics now rewrite correctly), or run `gnoma upgrade-config` (Phase 4). 3. BanditSection keeps the documented 0-sentinel pattern (0 = "use built-in default"). Pointer conversion was deliberately out of scope per the plan. Refs: docs/superpowers/plans/2026-05-24-config-migration.md |
||
|
|
f8ab522bef |
docs(todo,plans): specs for open features + MiniMax & ACP
Add implementation-ready plans for the in-flight features that lacked one, and two new provider/protocol items: - MiniMax provider (cloud arm + Token Plan billing decision) - Agent Client Protocol (ACP) — dual role: gnoma as ACP agent and as ACP client driving external agents as router arms - Network egress allowlist (Learn/Review/Enforce); note the per-session audit log is already implemented, remaining gap is a viewer command - Cross-platform (Windows/macOS) code touch-points + build-tag pattern - Distribution follow-ups (cosign, brew tap, installer, dockers_v2) Link each plan from its TODO.md entry; mark audit-log item done. |
||
|
|
98daebd359 |
docs(todo): cross-platform support — phase-breakdown + r/devops question map
Extends the cross-platform smoke-test entry surfaced 2026-05-28 into
a three-phase plan with concrete handles per concern:
Phase 1 — CI smoke matrix per tag (linux/darwin/windows × amd64/arm64).
Confirms the binary actually executes before any real bug-hunting.
Phase 2 — Windows-specific concerns mapped to the r/devops question
pattern u/HarjjotSinghh predicted ('crowd will ask within a week').
Each row: expected question, the gnoma-side gap it exposes, and the
rough fix scope. Covers PowerShell shell quoting, WSL vs native,
corporate-proxy / PAC support, Authenticode signing, MSI installer,
Event Viewer integration, Group Policy hooks, and air-gapped install
flow (ollama-dependency gap).
Phase 3 — macOS concerns: Apple-silicon launch sanity + Gatekeeper /
notarization warning on first run.
Pre-condition added for the eventual r/devops post: Phase 1 must be
in place before posting so the 'did you test it?' question has an
honest answer. Phase 2 items each need at least TODO acknowledgement
in the post body so the thread sees the gaps are tracked.
|
||
|
|
a468c3d2ed |
docs(readme,todo): origin paragraph + egress design refinement
README About section gets an Origin subsection describing how gnoma's security-first positioning emerged — not the original goal (provider- agnostic coding CLI was), but the answer to a gap that became obvious while building. Honest framing: 'the answer to what was missing, not the goal it set out with.' TODO updates from the r/SideProject thread (u/HarjjotSinghh, 2026-05-28) refine the security-boundary egress entry with a three-stage Learn → Review → Enforce rollout (was previously just 'open design question: host-level vs per-tool'). Captures the default allowlist baseline (package ecosystems + model providers), the SDK-egress middle ground (sentry/stripe/supabase), and the per-tool scoping layer above the project-wide allowlist. Also adds a new TODO entry for cross-platform smoke tests — Windows and macOS binaries ship every release but only Linux is exercised. Surfaced when answering 'are you planning a Windows build?' on the same thread and honestly couldn't claim the binaries are tested. |
||
|
|
7213a1e2fd |
docs: switch recommended SLM from reecdev/tiny3.5:500m to qwen3:0.6b
Release / release (push) Has been cancelled
Empirical comparison on 2026-05-25 across three candidate SLMs on
identical prompts (two prompts: trivial 'what is 2+2' + knowledge
'explain a multi-armed bandit'):
qwen3:0.6b consistent across both prompts
functiongemma:270m works trivial, derails on knowledge prompts
gemma3:1b unusable (emits just '{' or invented keys)
reecdev/tiny3.5:1.5b unusable (ignores /no_think, leaks <Thought Process> blocks)
qwen2.5-coder:1.5b unusable (ignores classifier prompt, answers in prose)
qwen3:0.6b honours Qwen3's native /no_think flag (the distillation in
the old default did not), is smaller than the previous recommendation
(520 MB vs 1 GB), and was the only candidate to classify both test
prompts successfully without falling back to heuristic.
README quickstart block + slm-backends.md presets + status output
sample all switched. Also documents register_as_arm (default true,
set false for task-specialised models like FunctionGemma) and
classify_timeout (default 15s) in the example configs since both
landed in v0.3.3+.
Code defaults for the tiny3.5 family in internal/router/defaults.go
are unchanged — that table still applies when users have tiny3.5
registered as a routing arm independent of the SLM role.
v0.3.4
|
||
|
|
fd327107df |
fix(router/discovery): always probe ollama capabilities, cache is optional
DiscoverOllama() interpreted a nil probeCache as 'skip probing entirely' rather than 'probe but don't cache.' cmd/gnoma/main.go's synchronous discovery path passes nil, so every ollama-discovered model got SupportsTools=false (the Go zero value), regardless of what ollama actually reported in its capabilities field. The symptom: filterFeasible rejected every ollama arm for any tool-requiring task with reason=tools_required_but_unsupported, even when ollama itself reported the model as tool-capable. Verified via curl: qwen3:14b advertises capabilities=[completion, tools, thinking] and has 'tools' in its template, but the gnoma arm shipped with tool_use_capability=false. Fix: always run probeOllamaModel; treat probeCache as an optional memoisation aid only. nil cache now means 'no caching across calls' not 'no probing.' For users with many models, passing a real cache still avoids redundant HTTP calls — semantics for that path are unchanged. Surfaced via the new filterFeasible Debug logging from the previous commit, which made the per-arm rejection reasons visible. |
||
|
|
0d3d190a8b |
fix(slm,session,router): classifier-only SLMs + session error recovery + feasibility diagnostics
Three coupled fixes that surfaced from a single FunctionGemma test session where the SLM-as-execution-arm assumption broke down and every subsequent prompt failed with 'session not idle (state: error)'. (A) [slm].register_as_arm config. The SLM has always been unconditionally registered as both classifier AND tier-0 execution arm. Fine for general-purpose models (ministral, qwen3-chat); breaks for task-specialised models (FunctionGemma emits function-call syntax instead of prose; embedding models can't generate). New pointer-bool config: nil/absent preserves the historical default (true), explicit false makes the SLM classifier-only and the execution path skips the slm/* arm. Three table tests cover absent / explicit-false / explicit-true decode paths. (B) Session error recovery. After any routing or engine error, the session moved to StateError and stayed there until restart — every new user prompt got rejected with 'session not idle (state: error)'. ResetError() was already wired for the /init retry path, but the general user-input and slash-command paths didn't call it. Added ResetError() before every user-initiated Send in the TUI so a fresh prompt always represents intent-to-retry. The /init internal retry already had its own ResetError; left alone. (C) filterFeasible per-arm rejection logging. Today's 'no feasible arm for task X' error tells you THAT every arm was rejected but nothing about WHY. Added slog.Debug per rejection (arm, task, complexity, reason, the specific violated constraint) plus a summary line when zero arms are feasible at any quality. Visible with --verbose; quiet otherwise. Surface area expansion only — no behaviour change for users not chasing a bug. |
||
|
|
c065a2dea7 |
fix(provider/openai): wire ResponseFormat into OpenAI request params
provider.Request.ResponseFormat was being silently dropped by the openai translation layer (translate.go:translateRequest). The upstream provider type and the openai-go SDK both supported it; the adapter just never propagated it. This is why Move 1 (set ResponseFormat=ResponseJSON in the SLM classifier) produced zero observable change: the field made it from the classifier into provider.Request but stopped at the OpenAI translation step. The ollama backend (used via the OpenAI-compatible endpoint) therefore never received format=json_object and kept emitting free-form prose, which the classifier's downstream JSON parser duly rejected — 50 fallbacks in a row across two model swaps. Translate provider.ResponseJSON to oai.ResponseFormatJSONObjectParam and provider.ResponseText to oai.ResponseFormatTextParam; leave the union zero-valued when the caller didn't set ResponseFormat so the SDK omits the field per its omitzero tag. Three table cases cover the json / text / unset paths. Affects ollama, llama.cpp, llamafile, and any other backend reached via openaicompat — all run through openai.translateRequest. |
||
|
|
24945b1eb2 |
docs(plans): encoder + contextual-bandit router architecture
Captures the architectural research surfaced during the 2026-05-25 SLM-failure diagnostic session: RouteLLM treats routing as classification, ModernBERT is well-suited to that classification, and FunctionGemma fits as an optional JSON-sanity layer rather than the primary classifier. The current decoder-SLM-as-classifier design is the wrong shape (100% failure rate observed across two model swaps). Five-phase plan: 1. Embedding feature scaffold (near-term, additive, opt-in) 2. Contextual bandit (LinUCB / Thompson) over the feature set 3. Retire the decoder-SLM classifier once 2 outperforms 4. ModernBERT fine-tune on the accumulated labelled data 5. FunctionGemma JSON sanity layer (optional final stage) Phase 1 is the only piece scoped for near-term implementation; the rest is multi-month and hinges on the strategic 'EMA vs SLM' question already tracked in TODO. Cross-references the existing tool-router-specialization plan so a reader of either lands on both. Updates the TODO entry for the bandit selector to note the supersession path. |
||
|
|
c0c2e4bff5 |
fix(slm): enforce JSON output + strip thinking-block prefixes
Two structural fixes for the SLM classifier's 100% failure rate: (1) Pass ResponseFormat=json_object + Temperature=0 + TopP=1 + MaxTokens=128 in the classifier Request. The provider type already supports these but callSLM was leaving them unset, which meant ollama (and any other backend) ran with default sampling and free-form text output. format=json mode in particular makes ollama emit only valid JSON at decoding time — eliminates the majority of parse failures. (2) Harden extractJSON to strip common thinking-block tags before hunting for the brace. Seen in the wild: <think>…</think> (Qwen3 distillations) and <Thought Process>…</Thought Process> (tiny3.5). Defensive list also covers <reasoning>, <thoughts>. Unterminated thinking blocks fall back to brace-search so we still have a shot. Table-driven tests cover all variants plus the no-tag and fenced-json paths to confirm no regression. Even with format=json on a capable provider, the extractor is the safety net for backends that don't enforce format strictly — same defence-in-depth shape as the existing fence stripping. Doesn't fix the deeper architecture question (encoder + bandit preferred over decoder-SLM as classifier — see plan doc landing in the same PR); fixes the immediate bug. |
||
|
|
f3c70bd802 |
fix(slm,router): honest classifier diagnostics + 15s default timeout
Five fixes folded into one commit because they all answer the same question: 'why does my router stats output lie to me?' Issue 1 (timeout). Default classify timeout was 5s — too short for cold-start ollama loads on small models. Bumped to 15s and surfaced as [slm].classify_timeout (0 = built-in default). Empirically caught when a user's reecdev/tiny3.5:1.5b hit 'stream error: context deadline exceeded' on every single classify call. Issue 2 (Warn-level error). The SLM-fallback path logged the underlying error at Debug, invisible without --verbose. Promoted to Warn so a first-time misconfiguration surfaces immediately. The fallback itself is benign; the signal is that the SLM isn't doing the work it was supposed to. Issue 3 (stats hint). Hard-coded 'check that llamafile boots' even when the user is on ollama. Replaced with backend-templated advice read from cfg.SLM.Backend. Also distinguishes three diagnostic cases that were collapsed before: - SLM never called (zero attempts) - SLM called N times but every call fell back (timeout/parse) - SLM working but minority share Issue 4 (effective heuristic share). The classifier breakdown shows 'heuristic' and 'slm_fallback' as separate sources, but both routed through HeuristicClassifier — only the source tag differs. New line under 'total observations' surfaces the combined share honestly: 'effective heuristic share: 100% (44 fallbacks + 10 pure heuristic)'. Issue 5 (config schema). [slm].classify_timeout joins the existing [slm] knobs alongside startup_timeout. Documented inline with the cold-start-load rationale. |
||
|
|
fa65a68728 |
docs(plans): config-migration and sensitive-content-policy
Release / release (push) Has been cancelled
Promotes two TODO entries into phased plan docs and links them from the TODO bullets. config-migration plan covers the silent layered-config corruption chain (encoder zero-spam -> reader overwrite -> wrong effective values) and its remediation across five phases: encoder fix (omitempty + pointer-numeric hybrid), project registry, gnoma doctor, gnoma upgrade-config, and auto-migration on startup with banner notice. sensitive-content-policy plan unifies three input paths (pasted text, pasted images, tool-read files) behind one decision API with consistent UI surface and audit-log integration. Phases A-E sequence the work from highest-leverage (text paste) to most complex (image OCR with local vision arm). Neither plan starts implementation in this commit — they exist to make the design decisions explicit so the eventual code can be reviewed against a written intent rather than a TODO bullet.v0.3.3 |
||
|
|
8b9bdc2978 |
feat(security): per-session firewall audit log
New AuditLogger writes one JSON line per firewall action to <projectRoot>/.gnoma/sessions/<sessionID>/audit.jsonl so a user can grep 'what did the firewall do this session?' after the fact. Records 'block', 'redact', 'warn', and 'unicode_sanitize' events with the matcher name, source (tool_result / message_text / etc.), and token length. Discipline: never the bytes themselves — only the matcher name and the length, matching the README's scope-note promise about audit data. Plumbing: - Firewall gains an audit *AuditLogger field plus SetAudit setter. The firewall is constructed before the session ID exists, so the audit logger is wired post-hoc once main.go has the sessionID. - Honours incognito: Record is a silent no-op when the firewall's IncognitoMode is active, preserving the no-persistence contract. - Tolerant of fs errors: mkdir / open / encode failures log a Warn but never propagate; the scan pipeline must not depend on audit succeeding. - Nil receiver is a valid no-op so callers don't need nil-guards around every Record. Tracks 'Security boundary — per-session audit log' from the v0.3.0 r/SideProject launch thread (u/Secret_Theme3192, 2026-05-24). Per-host egress allowlist remains separately tracked pending the commenter's reply on host-level vs per-tool semantics. |
||
|
|
eea26a262e |
feat(router): surface bandit knobs as [router.bandit] config
Four hardcoded constants in the selector and feedback tracker are now
user-tunable via [router.bandit]:
- quality_alpha (EMA smoothing, default 0.3)
- min_observations (samples before observed overrides heuristic, default 3)
- observed_weight (observed/heuristic blend ratio, default 0.7)
- strength_bonus (quality bonus for Strengths-tagged arms, default 0.15)
Each field treats 0 as 'use default', so an empty TOML block is
byte-identical to pre-config behaviour. BanditParams is plumbed via
router.Config{Bandit: ...} and resolveBanditParams() centralises the
fallback so every call site shares the same defaults.
QualityTracker, scoreArm, bestScored, and selectBest signatures now
take the configured values directly rather than reaching for package-
level constants. Tests updated to pass BanditParams{} (defaults) or
explicit overrides where they validate the new tuning paths.
Tracks item #3 from the 'Bandit selector — design decisions deferred'
TODO entry — ships independently of the EMA vs SLM strategic decision.
|
||
|
|
352cab4a94 |
docs(todo): extend config-migration plan with project registry
Release / release (push) Has been cancelled
Adds item #5 to the config write/merge corruption entry: ~/.config/gnoma/projects.json tracking which directories gnoma has been launched in. Enables doctor --all-projects, cross-project session listing, and one-shot upgrade-config across all known projects. Documents the design constraints: must use the same omitempty / atomic-write discipline as the encoder fix to avoid recreating the class of bug it exists to help solve. Privacy footprint flagged (local-only directory log; opt-out toggle). Stale-entry handling gated through doctor, not auto-prune.v0.3.2 |
||
|
|
58f4001917 |
docs(todo): track config write/merge corruption + doctor/upgrade design
setConfig() serializes the entire Config struct on every key change, which writes zero-valued fields into the file. On the next load those explicit zeros override higher-priority layers via toml.Decode's present-beats-absent semantics. Concrete symptom today: a global prefer = 'cloud' was silently shadowed by a project prefer = ''. Captures the multi-part fix surface so it doesn't get half-done: - Stop generating zero-spam (omitempty hybrid or pelletier swap). - gnoma doctor: read-only diagnostic (zero-spam, invalid enums, removed keys, effective-merged values). - gnoma upgrade-config: active migration with .bak backup + diff. - Auto-migrate project-level on startup with TUI banner notice; global stays explicit. |
||
|
|
6c5e969217 |
feat(tui): add /router command for runtime routing-preference switch
Mirrors the pattern of /permission: bare command shows the current value plus a help line; with an argument (auto/local/cloud) it calls Router.SetPreferPolicy and emits a system message. Session-only — does not write back to config.toml, matching /permission and Ctrl+X incognito-toggle conventions. Tab completion on the value via routerPreferModes alongside the existing permissionModes pattern. Help text updated. Status-bar indicator deferred (separate concern if it turns out to be wanted). |
||
|
|
74bd570438 |
fix(tui): de-dupe /init in command picker; skill names shadow builtins
/init appeared twice in the completion picker — once from the static
builtinCommands list and once from the bundled init skill at
internal/skill/skills/init.md (registered via skills.All()).
Two changes:
- Remove /init from builtinCommands. The skill provides the canonical
entry, and its description ('Generate or update AGENTS.md project
documentation') is more accurate than the static one ('initialize
project — create AGENTS.md') because the skill handles both create
and update.
- Refactor completionSource() so a skill name silently shadows any
builtin with the same name. Prevents this from recurring if a
future builtin migrates to a skill, and lets users override a
builtin's description by dropping a skill of the same name into
.gnoma/skills/.
|
||
|
|
d38d7daf25 |
fix(subprocess/agy): disable ToolUse until stream-json lands
agy is registered with FormatAgyText and the agyParser emits every stdout line as a plain EventTextDelta. There is no path for a structured ToolCall event to come back. With ToolUse=true the router would dispatch tool-needing tasks (security_review, spawn_elfs, file edit) to agy; the underlying Gemini model would describe calling the tool in prose — invented UUIDs and 'I will pause now'-style stubs — the engine would receive only text, and the turn would hang waiting for a tool call that never arrives. Surfaced when /init routed to agy for a security_review task and elf spawning visibly hallucinated in the TUI. Capability flag flipped to false; agy stays usable for tool-free prompts (explain, summarize, simple chat). TODO entry for native stream-json updated to flag that the capability flip is part of that same change. |
||
|
|
06d4069076 |
ci: pin GoReleaser to the triggering tag, fix tag-collision regression
Release / release (push) Has been cancelled
When v0.3.1 was tagged on the same commit as v0.3.1-rc2, the release workflow built and tried to publish rc2 artifacts instead of v0.3.1, failing with 'already_exists' on every asset upload. Root cause: goreleaser-action@v6 + 'version: latest' (locked to v2.x) falls back to 'git describe --tags' for the current tag, which picked v0.3.1-rc2 over v0.3.1 when both refs pointed at HEAD. Explicitly setting GORELEASER_CURRENT_TAG = github.ref_name forces the workflow to use the tag that triggered it, regardless of other refs at the same commit.v0.3.1 |
||
|
|
f641bd4971 |
docs(todo): track bandit selector design questions
Two related items surfaced from the r/coolgithubprojects v0.3.1 launch thread. Bundled because they share the selector code: 1. Whether to keep numeric EMA at all post-SLM dispatcher (open strategic question from the 2026-05-07 roadmap — not a must-implement). 2. Surfacing hardcoded selector knobs (qualityAlpha, blend ratio, strength bonus, quality floor) as [router.bandit] config keys — ships independently of #1. |
||
|
|
798f2ab3c3 |
fix(release): prerelease auto-detect; changelog excludes scoped conventional commits
Release / release (push) Has been cancelled
Two polish issues surfaced by the v0.3.1-rc1 pipeline test: - The release was tagged v0.3.1-rc1 but published without the prerelease flag, so it appeared alongside stable releases. Add 'prerelease: auto' to release.github so GoReleaser marks any tag with a semver prerelease suffix (-rc, -beta, -alpha, -pre) appropriately. - The changelog filters used '^docs:' patterns that only match bare conventional commits. Scoped variants like 'docs(readme):' and 'chore(make):' slipped through into the published changelog. Switch to '^docs[:(]' style patterns to match both forms, and add '^style[:(]' so gofmt-drift commits are excluded too.v0.3.1-rc2 |
||
|
|
9814795b3c |
ci: migrate release pipeline from Woodpecker to GitHub Actions
Release / release (push) Has been cancelled
Drop the broken .woodpecker/release.yml (top-level when: triggered an 'error' status on every dev push instead of skipping non-tag events) and replace with .github/workflows/release.yml driving the same GoReleaser flow. Rationale: - Release artifacts already land on GitHub (releases + ghcr.io), so running the pipeline on GitHub eliminates a build hop. - GH Actions auto-provides GITHUB_TOKEN with packages:write via the workflow permissions block — no PAT plumbing or login secrets. - docker/setup-qemu-action and docker/setup-buildx-action handle the multi-arch cross-build setup that Woodpecker would require manual host configuration for. Trigger: any tag matching refs/tags/v*. Mirror sync from somegit.dev propagates tags to GitHub, so 'git push origin v0.3.1' on the canonical remote still drives the GitHub-side release.v0.3.1-rc1 |
||
|
|
047924da2b |
ci(woodpecker): release pipeline on vX.Y.Z tag
Runs 'go test ./...' then 'goreleaser release --clean' inside the official goreleaser image when a tag matching refs/tags/v* is pushed. GITHUB_TOKEN comes from the 'github_token' repo secret (needs repo + write:packages scopes) and is reused for ghcr.io docker login so the multi-arch image build can push. Runner requirements documented inline: docker socket access plus QEMU registered on the host (tonistiigi/binfmt --install all) for arm64 cross-builds. Directory form chosen so a non-release CI pipeline can land later under .woodpecker/ci.yml without restructuring. |
||
|
|
a23eb6b92c |
style: gofmt drift from prior commits
Pure whitespace cleanup surfaced when 'make check' ran gofmt over the tree. Mostly struct-field column alignment in internal/safety/banner.go (SessionInfo) and the var(...) flag block in cmd/gnoma/main.go after --dangerously-allow-anywhere was added without realignment. Verified zero substantive changes via 'git diff --ignore-all-space --ignore-blank-lines'. |
||
|
|
0981fb82d6 |
chore(make): add govulncheck and semgrep to 'make check'
Both checks already passed locally on the current dev tip; wiring them into the canonical pre-commit gate so security regressions fail fast instead of leaking into a release. - 'make vuln' runs govulncheck with reachability analysis against the Go vuln DB. - 'make sec' runs semgrep with p/golang + p/security-audit, metrics off, --error so findings exit non-zero. Tools must be installed locally (commands in Makefile comments). If upstream Woodpecker CI runs 'make check', it will need both binaries on the runner image. |
||
|
|
3888966e68 |
fix(deps): bump golang.org/x/net to v0.55.0 to clear reachable CVEs
govulncheck flagged two reachable vulnerabilities in golang.org/x/net@v0.52.0: - GO-2026-5026 (idna fails to reject ASCII-only Punycode labels), reached via router.DiscoverOllama -> http.Client.Do -> idna.ToASCII. - GO-2026-4918 (HTTP/2 transport infinite loop on bad SETTINGS_MAX_FRAME_SIZE), same call path -> http2.Transport.*. Bumping to v0.55.0 covers both. Transitive bumps to x/crypto v0.51.0, x/sys v0.45.0, x/text v0.37.0. Post-bump govulncheck reports 0 reachable vulnerabilities and 0 in directly imported packages. |
||
|
|
847cd5fe0c |
fix(security): use crypto/rand for session-ID suffix
Semgrep flagged math/rand for the /tmp artifact-directory session-ID generation. Modern Go (1.20+) auto-seeds the global math/rand source so this wasn't exploitable in practice, but crypto/rand is the idiomatic choice for any security-adjacent identifier and removes the finding from future security audits. Drops the mrand alias entirely; reads 8 random bytes once and masks to 24 bits to preserve the existing %06x suffix format. |
||
|
|
001865f069 |
fix(env): correct ANTHROPIC_API_KEY typo, add missing vars
The placeholder ANTHROPICS_API_KEY (with trailing S) silently failed: the auth layer reads ANTHROPIC_API_KEY, so anyone copying .env.example to .env and pasting their key would see gnoma never pick it up, with no clear error. Also surfaces vars that already work but weren't templated: GOOGLE_API_KEY (alternative to GEMINI_API_KEY), GNOMA_PROVIDER and GNOMA_MODEL (config overrides), and the two subprocess sandbox bypass footguns (GNOMA_AGY_BYPASS_PERMISSIONS, GNOMA_CODEX_BYPASS_SANDBOX), left commented out so they don't accidentally turn on. |
||
|
|
c1c52f139d |
docs(readme): add 'no phone-home' bullet and data-flow scope note
Clarify that gnoma itself emits no telemetry to external services while being explicit that cloud-provider arms send data to those providers by design. Adds: - 'No phone-home' bullet to the differentiator list, naming the on-device path (Ollama/llama.cpp + --incognito). - 'Data flow' paragraph to the Security scope-note blockquote so the framing is consistent between the hero bullets and the Security section. |
||
|
|
7040041f13 |
docs(readme): correct firewall scope; track egress controls in TODO
The 'What makes gnoma different' bullet and Security section both implied a network-egress firewall. Today the Firewall only enforces a content boundary (secret scan, Unicode sanitize, redact/block). Reword both spots and add a Scope note. Surface the gap as a top-of-TODO entry covering per-session audit log and per-host egress allowlist, with the open design question (host-level vs per-tool) called out. Raised via r/SideProject v0.3.0 launch thread. |
||
|
|
1828151162 |
docs(claude): big-picture architecture and expanded test commands
Add a 'Big picture' section summarising the request flow (cmd → session → engine → router → security/permission → extensibility) so future Claude Code instances can orient without reading INDEX.md plus five package directories first. Note that internal/safety and internal/slm aren't in INDEX.md yet. Document the somegit.dev / GitHub mirror split and the ruleset that blocks force-push and deletion on main/dev. Expand build/test section with make check, make test-integration, single-test, and benchmark commands. |
||
|
|
b5062d59e9 |
docs(readme): hero screenshot, differentiators, status, TOC
Add docs/img/gnoma-tui.png as a hero image so visitors see the TUI above the fold instead of a wall of text. Pull the bandit router, prefer-policy, SLM, and built-in firewall out of buried sections into a 'What makes gnoma different' bullet list. Add a Status block flagging pre-1.0 and a table of contents. Move the pygmy-owl naming note and upstream/mirror URLs into a footer About section. |
||
|
|
b13a6a2801 |
docs(plans): mark v0.3.0 plans shipped
Three plans shipped end-to-end in v0.3.0; removing them from TODO.md In-flight and adding a Status: shipped header to each plan doc with the commit references. Shipped: - 2026-05-23-routing-defaults-refresh.md - 2026-05-23-prefer-routing-policy.md - 2026-05-23-startup-safety-banner.md Still in flight (telemetry-gated, fires only if measurements support it): - 2026-05-23-tool-router-specialization.md |
||
|
|
8ba77c1685 |
fix(safety): env-template precision, label alignment, banner on bypass
Three polish items surfaced during the maintainer's manual smoke
of the previous safety commit.
env-template precision (false-positive fix):
The "env file" rule matched .env.* universally, which flagged
conventional templates like .env.example / .env.sample /
.env.template / .env.dist / .env.default — these hold variable
NAMES, no values, and are commonly committed. Now skipped.
Real env files (.env, .env.local, .env.production) still match.
New envTemplateSuffixes table + isEnvTemplate helper; check runs
only inside the env-file rule so the suffix denylist is scoped.
Tests added for both directions: 6 templates that must NOT flag,
6 real env files that must.
Banner label alignment:
Field labels were padded to 8 chars except "sensitive" at 9,
producing visible misalignment in the rendered banner:
cwd : /...
provider : ollama / ...
sensitive : 0 matches in cwd <- one extra space
Padded all labels to 9 chars so the ":" separators line up.
Context banner on bypass:
--dangerously-allow-anywhere previously suppressed the entire
safety block, including the informational context banner.
Bypassing the GATE is not the same as opting out of the info —
the user still wants to see cwd / git state / sensitive files
nearby. Restructured the safety block so classification + banner
always run; the bypass only skips the refuse/warn FLOW. The
bypass warning log now also includes the classified tier and
cwd path for diagnostics.
v0.3.0
|
||
|
|
c483656681 |
docs(plans): fix gnoma one-shot invocation in safety-banner plan
gnoma takes the prompt as a positional argument, not via -p (that's Claude Code's syntax). Surfaced when the maintainer tried the manual smoke from the plan's "Definition of done" section and hit the "flag provided but not defined: -p" error. before: gnoma -p "test" after: gnoma "test" The same wrong syntax appears in the |
||
|
|
d206b3cf09 |
docs: routing-prefer + startup-safety user docs, plan tier-shift note
README:
- New "Preferring local vs cloud" subsection under "Routing
defaults" — table of the three [router].prefer values, priority
order against forced arm / incognito / Strengths, and the
CLI-agent-counts-as-local clarification.
- New "Startup safety check" subsection under "Security" — tier
table, [safety] config block, --dangerously-allow-anywhere flag,
container detection note, link to the plan doc.
Plan doc (prefer-routing-policy):
- Approach section updated to describe the tier-shift mechanism
that actually shipped, with a clear "Implementation note"
explaining why the original score-multiplier approach was
abandoned (cost-floor math gives local arms a ~280x raw-score
advantage that any reasonable multiplier can't overcome).
- CLI-agent placement flipped from "non-local" to "local" with
rationale — implementation chose user-facing behavior axis over
the privacy axis the original draft used.
- Tier-shift rationale table replacing the multiplier rationale.
- P-3 task rewritten to reflect the actual implementation (checked
off and pointing at the right code), with the policyMultiplier
helper noted as a within-tier nudge of limited present effect.
The implementation-vs-plan deviation is now documented in both the
plan doc and the original feature commit message (
|
||
|
|
3eeb5b46d7 |
feat(safety): pre-launch cwd classifier + context banner
Implements S-1 through S-7 of the startup-safety-banner plan.
Adds a pre-launch safety check that classifies the current working
directory into three tiers and gates the launch:
TierRefuse /, /etc, /sys, /proc, /usr, /var, /bin, /sbin, /boot,
/root, /dev (Linux) and /System, /Library, /private,
/Applications (macOS). Refuses with exit 2 unless
--dangerously-allow-anywhere is passed.
TierWarn $HOME, ~/Desktop, ~/Downloads, ~/Documents, ~/.config,
~/.local, ~/.cache, /tmp, and similar dumping grounds.
Prints a banner and reads a single y/Y from stdin to
confirm; any other input (or EOF, including piped/
scripted invocation) aborts with exit 1.
TierOK Anywhere with a recognized project marker (.gnoma/,
go.mod, package.json, pyproject.toml, Cargo.toml,
Makefile, Dockerfile, build.gradle*, pom.xml) or
inside a git repo. No prompt; banner only.
Project markers and git-repo presence override the TierWarn check —
a project dir inside $HOME stays TierOK. The require_project_marker
config knob can flip that for strict users.
Container detection: when /.dockerenv or /run/.containerenv exists,
TierRefuse downgrades to TierWarn (devcontainers often chroot to /
or similar). Best-effort; false positives only soften the gate.
The context banner is always rendered (TierOK, TierWarn, TierRefuse
alike) and summarizes: cwd, git branch + dirty state, project type,
provider/model, modes (permission, incognito, prefer), and a
top-level sensitive-file inventory. Inventory matches .env,
.env.*, env.local; private-key extensions (.pem, .key, .crt, .p12,
.pfx); SSH key names (id_rsa, id_ed25519, ...); credentials files;
.netrc / .pgpass; KeePass vaults; and .ssh/ .aws/ .kube/ .gcloud/
.azure/ .docker/ directories. Precision-tested: .envrc and
secret_handler.go do NOT match. Bounded at 1000 entries.
Architecture:
- internal/safety/cwd.go — Classification + symlink-resolving tier
classifier with platform-specific roots and container detection.
- internal/safety/sensitive.go — pattern-based top-level scanner,
deterministic ordering, scanLimit guard against pathological dirs.
- internal/safety/banner.go — pure render functions for the warn
prefix, refuse message, and context banner. Safe for golden-string
testing.
- internal/config/config.go — new [safety] section with three
config keys, defaults applied via ResolvedSafety() helper. Pointer
fields distinguish "user omitted" from "user set to false."
- cmd/gnoma/main.go — gate runs after subcommand dispatch (so
`gnoma providers / profile / slm / router` skip the prompt) and
before provider creation. --dangerously-allow-anywhere bypasses
the gate with an explicit log warning.
The runtime keypress reads up to 8 bytes from os.Stdin and accepts
only "y" / "Y" trimmed; EOF returns false (piped invocations
without the flag will abort). Documented in the readYesConfirmation
helper. Manual smoke (per plan):
- `cd / && gnoma -p test` → refuses
- `cd ~ && gnoma` → warns + keypress
- `cd ~/git/some-repo && gnoma` → banner only
- subcommands skip the gate entirely
Linux + macOS classification; Windows path handling deferred per
plan (treated as TierOK there until follow-up).
Refs: docs/superpowers/plans/2026-05-23-startup-safety-banner.md
|
||
|
|
f9094f68f3 |
feat(router): [router].prefer = local | cloud | auto
Implements P-1 through P-6 of the prefer-routing-policy plan.
Adds a config knob that biases routing toward local arms, cloud
arms, or leaves selection unchanged. Default "auto" is
byte-identical to pre-change behavior (the new armTier path with
PreferAuto returns the same value as the old single-arg function).
Mechanism diverged from the plan after empirical testing:
The plan called for a score multiplier applied in bestScored.
Tests revealed the existing cost-floor math (scoreArm divides by
weighted cost which collapses to ~0.001 for free local arms) gives
local arms a ~280x raw-score advantage that a 0.3-0.5 multiplier
can't overcome. A tier-shift in armTier turned out cleaner:
PreferLocal: cloud arms (true API, IsLocal=false && !IsCLIAgent)
get +2 tier shift, landing behind locals.
PreferCloud: IsLocal arms get +2 tier shift, landing behind
cloud. SLM tier-0 arms shift to tier 2 — still
below cloud's tier 3 — so the SLM-protection
semantic (small stuff stays on the small model)
survives PreferCloud. This matches the open
question in the plan, now resolved as: yes, SLMs
keep winning under PreferCloud by design.
The policyMultiplier was kept in bestScored as a within-tier
nudge (mostly cosmetic in practice given the cost-floor dynamics
described above; could matter when costs are calibrated). Worth
revisiting once router-wide cost calibration lands.
Strengths cross-tier promotion is unaffected: the promoted-set
path in selectBest bypasses armTier entirely, so a strongly-tagged
cloud arm still wins SecurityReview tasks under PreferLocal
(validated by TestPreferPolicy_StrengthsBeatsMultiplier).
CLI-agent subprocess arms count as "local" for PreferLocal
purposes — they proxy to cloud but the user-visible behavior is
local. Users who want to exclude them can use --provider X.
Forced arms (--provider X) and incognito take priority over the
policy: forced arm test pins this, incognito-still-wins test pins
the LocalOnly hard filter dominating PreferCloud.
Test coverage (prefer_test.go): ParsePreferPolicy / String round
trips; policyMultiplier table; acceptance scenarios across all
three policies with adjacent-tier arms; SLM-still-wins under
PreferCloud; Strengths beats multiplier; forced-arm bypass;
incognito beats prefer; lone cloud arm wins when no local feasible.
Refs: docs/superpowers/plans/2026-05-23-prefer-routing-policy.md
|
||
|
|
162c8b1017 |
docs(plans): prefer-routing-policy and startup-safety-banner
Two parallel pre-flight plans surfaced in the 2026-05-23 session, both deferred while the routing-defaults-refresh implementation landed. Drafted as separate plans because they're independent: the prefer-policy is a router scoring change; the safety banner is a launch-time check that never touches the router. prefer-routing-policy [router].prefer = "local" | "cloud" | "auto" — soft score multiplier (0.3 / 0.5 / 1.0) biasing toward local or cloud arms while preserving Strengths cross-tier promotion and bandit learning. Default "auto" is byte-identical to current behavior. Forced arms and incognito retain priority. CLI-agent subprocess arms count as non-local for this knob (they proxy to cloud). startup-safety-banner Three-tier cwd classification at launch — refuse in /etc /sys and other system roots; warn+keypress in $HOME, /tmp, ~/Desktop, ~/Downloads; OK inside any git repo or directory with a project marker (.gnoma/, go.mod, package.json, etc.). Always shows a context banner with cwd, git state, model, modes, and a top-level sensitive-file inventory (.env, id_rsa, *.pem, .ssh/, etc. — informational only, no recursion, capped at 1000 entries). Bypass via --dangerously-allow-anywhere. Complements the in-flight sensitive-content unified-policy TODO item: this is the pre-flight layer, that is the runtime input-path layer. Both plans default-on with safe defaults; both have explicit out-of-scope sections to prevent scope creep during implementation. Linux + macOS first; Windows path classification deferred. TODO.md surfaces both as in-flight. |
||
|
|
c99b2c64ad |
docs(readme): document routing defaults table and [[arms]] overrides
Closes R-8 of the routing-defaults plan. Adds a new "Routing defaults" section between Config and SLM that documents what arms ship with out-of-the-box — the family-keyed Strengths / MaxComplexity / CostWeight matrix plus the non-chat exclude list. Also introduces the [[arms]] override block in the README for the first time (previously undocumented), showing how users keep priority over the defaults. Links back to the plan doc for the benchmark sources and per-entry rationale. |
||
|
|
2f8d4c412f |
feat(router): cloud-arm defaults, gpt-5.3-codex registration
Closes R-4 and R-5 of the routing-defaults plan.
R-4: Strengths + CostWeight defaults for closed frontier models.
Cloud entries land in the same knownFamilyDefaults table as local
ones, with MaxComplexity intentionally left zero (cloud arms get
no complexity ceiling). CostWeight tuned per the plan's rationale:
claude-opus-4-7 → Planning/SecurityReview/Debug/Refactor, 0.3
claude-sonnet-4-6 → Generation/Refactor/Review, 0.7
gpt-5.5 → Planning/SecurityReview/Generation, 0.3
gpt-5.3-codex → Generation/Refactor/Debug/UnitTest, 0.6
gpt-5.2 → Orchestration/Review, 0.8
gemini-3.1-pro → Planning/Review/Orchestration, 0.5
gemini-3.5-flash → Boilerplate/Explain/Orchestration, 1.2
The 0.3 weight on frontier arms keeps them competitive on
SecurityReview / Planning despite $4+/Mtok; 1.2 on Gemini Flash
penalizes cost more so it only wins when cost is genuinely
decisive (boilerplate, explain).
Mechanism: extracted applyFamilyDefaults into defaults.go and call
it from Router.RegisterArm. Single source of truth — both local
discovery and the primary-provider path in cmd/gnoma/main.go now
flow through the same defaults application. Removed the duplicate
apply block from RegisterDiscoveredModels.
Legacy model IDs (claude-opus-4-20250514, gpt-4o, o3, gemini-2.5-pro,
etc.) intentionally do not match any table entry — keeps users on
pinned older models safe from imposed 2026 Strengths.
R-5: gpt-5.3-codex registration.
- internal/provider/openai/provider.go: added to fallbackModels
and inferOpenAIModelCapabilities (400K context, 32K output).
- internal/provider/ratelimits.go: gpt-5.3-codex and its dated
alias gpt-5.3-codex-2026-02-15 added with the same Tier 1
quotas as gpt-5.2.
Gemini 3.x (3.1-pro-preview, 3.5-flash, 3.1-flash-lite) was already
registered in both google/provider.go and ratelimits.go — no change
needed for that part of R-5.
Test coverage:
- ResolveFamilyDefaults table-driven across all 7 cloud entries
including prefix-sharing (gpt-5.5-pro → gpt-5.5 defaults,
gemini-3.1-pro-preview → gemini-3.1-pro defaults).
- Legacy IDs return !ok.
- RegisterArm applies cloud defaults end-to-end.
- User-supplied Strengths and CostWeight are not overridden.
- ID.Model() fallback works when ModelName is empty (test code
often constructs arms this way).
Refs: docs/superpowers/plans/2026-05-23-routing-defaults-refresh.md
|
||
|
|
9bb775a4aa |
feat(router): full local family defaults table with size-keyed ceilings
Expands the family-defaults scaffold to 23 entries covering the local models that currently appear in real Ollama fleets: coder specialists (qwen3-coder, devstral, qwen2.5-coder, yi-coder, deepseek-coder, starcoder), reasoners (phi-4, phi-4-mini), Gemma 2/3/4 (including the "edge" e2b/e4b variants under both Ollama and GGUF naming), Qwen 2.5/3/3.5 with a catch-all qwen entry, Mistral/Ministral (incl. the 24B mistral-small-3), Llama 3.2/4, tiny3.5 (reec's distill family), Granite, GLM (incl. glm-ocr specialist), and MiniCPM-V. Five families that span wide parameter ranges (qwen3.5, qwen3, qwen2.5, ministral-3, tiny3.5) now use SizeCap ladders instead of a flat MaxComplexity. A new parseSizeFromModelID helper splits the model ID on :/-_/ and matches pure <N>b/<N>m tokens, correctly ignoring qwen3.5 version strings, e2b edge tags, a3b MoE active params, and v0.3 version suffixes. ResolveMaxComplexity wraps ResolveFamilyDefaults plus the SizeCap traversal, falling back to the smallest cap when size parsing fails (conservative). Discovery's apply path now goes through it so SizeCap entries actually take effect. Test coverage: - parseSizeFromModelID (11 cases) - ResolveFamilyDefaults longest-prefix discipline (19 cases) - Unknown-family fallback returns !ok - ResolveMaxComplexity size-keyed ladder (13 cases) - Size-parse-failure fallback - knownFamilyDefaults invariants: SizeCaps ordered largest-first, SizeCaps and MaxComplexity mutually exclusive per entry - Routing-payoff integration: 3 arms (tiny3.5:1.5b, phi-4:14b, qwen3-coder:30b) get picked for TaskGeneration / TaskPlanning / TaskBoilerplate respectively, without any [[arms]] config - Local fleet visibility: the maintainer's actual `ollama ls` inventory registers correctly with expected MaxComplexity and Strengths; embeddinggemma stays filtered out The Planning sub-case surfaced a separate issue worth flagging: heuristicQuality floors out at 0.55 for a generic 14B local model without ThinkingModes, below TaskPlanning's 0.60 threshold. The test mutates phi-4's capabilities post-registration to reflect reality (phi-4 is reasoning-tuned). A discovery-side thinking-capability detection is out of scope for this plan but flagged in the test comment for follow-up. Refs: docs/superpowers/plans/2026-05-23-routing-defaults-refresh.md |
||
|
|
a79e99199d |
feat(router): non-chat exclude, vision prefixes, family-defaults scaffold
Discovery previously registered every model returned by Ollama as a chat arm, including embeddings, ASR, TTS, audio realtime, and rerankers — which then failed at inference time when the router selected them. Local arms also shipped with all-zero defaults, so selection between e.g. tiny3.5:1.5b, phi-4:14b, and qwen3-coder:30b was effectively random. This change covers tasks R-1, R-2, R-6 from the routing-defaults plan. - nonChatModelPatterns + isNonChatModel substring matcher; matched IDs are skipped during RegisterDiscoveredModels. Covers whisper, moonshine, kokoros, vibevoice, -asr, -tts, -audio, -embedding, embeddinggemma, -reranker, lfm2. - knownVisionModelPrefixes gains gemma4, gemma-4, glm-ocr. gemma3 and minicpm-v entries stay for regression coverage. - New internal/router/defaults.go with FamilyDefaults struct, knownFamilyDefaults map, and ResolveFamilyDefaults longest-prefix lookup (with org/-namespace stripping so reecdev/tiny3.5:1.5b resolves to "tiny3.5"). Single entry for now: functiongemma is registered with Disabled=true and MaxComplexity=0.40, reserved for the future ArmRoleToolRouter path. Table will grow in R-3. - RegisterDiscoveredModels consults ResolveFamilyDefaults and only populates fields that are still zero on the arm, so user [[arms]] overrides keep priority. Plans: - docs/superpowers/plans/2026-05-23-routing-defaults-refresh.md - docs/superpowers/plans/2026-05-23-tool-router-specialization.md TODO.md surfaces both as in-flight items. |