DiscoverOllama() interpreted a nil probeCache as 'skip probing
entirely' rather than 'probe but don't cache.' cmd/gnoma/main.go's
synchronous discovery path passes nil, so every ollama-discovered
model got SupportsTools=false (the Go zero value), regardless of
what ollama actually reported in its capabilities field.
The symptom: filterFeasible rejected every ollama arm for any
tool-requiring task with reason=tools_required_but_unsupported,
even when ollama itself reported the model as tool-capable. Verified
via curl: qwen3:14b advertises capabilities=[completion, tools,
thinking] and has 'tools' in its template, but the gnoma arm shipped
with tool_use_capability=false.
Fix: always run probeOllamaModel; treat probeCache as an optional
memoisation aid only. nil cache now means 'no caching across calls'
not 'no probing.' For users with many models, passing a real cache
still avoids redundant HTTP calls — semantics for that path are
unchanged.
Surfaced via the new filterFeasible Debug logging from the previous
commit, which made the per-arm rejection reasons visible.
Closes R-4 and R-5 of the routing-defaults plan.
R-4: Strengths + CostWeight defaults for closed frontier models.
Cloud entries land in the same knownFamilyDefaults table as local
ones, with MaxComplexity intentionally left zero (cloud arms get
no complexity ceiling). CostWeight tuned per the plan's rationale:
claude-opus-4-7 → Planning/SecurityReview/Debug/Refactor, 0.3
claude-sonnet-4-6 → Generation/Refactor/Review, 0.7
gpt-5.5 → Planning/SecurityReview/Generation, 0.3
gpt-5.3-codex → Generation/Refactor/Debug/UnitTest, 0.6
gpt-5.2 → Orchestration/Review, 0.8
gemini-3.1-pro → Planning/Review/Orchestration, 0.5
gemini-3.5-flash → Boilerplate/Explain/Orchestration, 1.2
The 0.3 weight on frontier arms keeps them competitive on
SecurityReview / Planning despite $4+/Mtok; 1.2 on Gemini Flash
penalizes cost more so it only wins when cost is genuinely
decisive (boilerplate, explain).
Mechanism: extracted applyFamilyDefaults into defaults.go and call
it from Router.RegisterArm. Single source of truth — both local
discovery and the primary-provider path in cmd/gnoma/main.go now
flow through the same defaults application. Removed the duplicate
apply block from RegisterDiscoveredModels.
Legacy model IDs (claude-opus-4-20250514, gpt-4o, o3, gemini-2.5-pro,
etc.) intentionally do not match any table entry — keeps users on
pinned older models safe from imposed 2026 Strengths.
R-5: gpt-5.3-codex registration.
- internal/provider/openai/provider.go: added to fallbackModels
and inferOpenAIModelCapabilities (400K context, 32K output).
- internal/provider/ratelimits.go: gpt-5.3-codex and its dated
alias gpt-5.3-codex-2026-02-15 added with the same Tier 1
quotas as gpt-5.2.
Gemini 3.x (3.1-pro-preview, 3.5-flash, 3.1-flash-lite) was already
registered in both google/provider.go and ratelimits.go — no change
needed for that part of R-5.
Test coverage:
- ResolveFamilyDefaults table-driven across all 7 cloud entries
including prefix-sharing (gpt-5.5-pro → gpt-5.5 defaults,
gemini-3.1-pro-preview → gemini-3.1-pro defaults).
- Legacy IDs return !ok.
- RegisterArm applies cloud defaults end-to-end.
- User-supplied Strengths and CostWeight are not overridden.
- ID.Model() fallback works when ModelName is empty (test code
often constructs arms this way).
Refs: docs/superpowers/plans/2026-05-23-routing-defaults-refresh.md
Expands the family-defaults scaffold to 23 entries covering the local
models that currently appear in real Ollama fleets: coder specialists
(qwen3-coder, devstral, qwen2.5-coder, yi-coder, deepseek-coder,
starcoder), reasoners (phi-4, phi-4-mini), Gemma 2/3/4 (including the
"edge" e2b/e4b variants under both Ollama and GGUF naming), Qwen
2.5/3/3.5 with a catch-all qwen entry, Mistral/Ministral (incl. the
24B mistral-small-3), Llama 3.2/4, tiny3.5 (reec's distill family),
Granite, GLM (incl. glm-ocr specialist), and MiniCPM-V.
Five families that span wide parameter ranges (qwen3.5, qwen3,
qwen2.5, ministral-3, tiny3.5) now use SizeCap ladders instead of a
flat MaxComplexity. A new parseSizeFromModelID helper splits the
model ID on :/-_/ and matches pure <N>b/<N>m tokens, correctly
ignoring qwen3.5 version strings, e2b edge tags, a3b MoE active
params, and v0.3 version suffixes.
ResolveMaxComplexity wraps ResolveFamilyDefaults plus the SizeCap
traversal, falling back to the smallest cap when size parsing fails
(conservative). Discovery's apply path now goes through it so
SizeCap entries actually take effect.
Test coverage:
- parseSizeFromModelID (11 cases)
- ResolveFamilyDefaults longest-prefix discipline (19 cases)
- Unknown-family fallback returns !ok
- ResolveMaxComplexity size-keyed ladder (13 cases)
- Size-parse-failure fallback
- knownFamilyDefaults invariants: SizeCaps ordered largest-first,
SizeCaps and MaxComplexity mutually exclusive per entry
- Routing-payoff integration: 3 arms (tiny3.5:1.5b, phi-4:14b,
qwen3-coder:30b) get picked for TaskGeneration / TaskPlanning /
TaskBoilerplate respectively, without any [[arms]] config
- Local fleet visibility: the maintainer's actual `ollama ls`
inventory registers correctly with expected MaxComplexity and
Strengths; embeddinggemma stays filtered out
The Planning sub-case surfaced a separate issue worth flagging:
heuristicQuality floors out at 0.55 for a generic 14B local model
without ThinkingModes, below TaskPlanning's 0.60 threshold. The test
mutates phi-4's capabilities post-registration to reflect reality
(phi-4 is reasoning-tuned). A discovery-side thinking-capability
detection is out of scope for this plan but flagged in the test
comment for follow-up.
Refs: docs/superpowers/plans/2026-05-23-routing-defaults-refresh.md
Discovery previously registered every model returned by Ollama as a
chat arm, including embeddings, ASR, TTS, audio realtime, and
rerankers — which then failed at inference time when the router
selected them. Local arms also shipped with all-zero defaults, so
selection between e.g. tiny3.5:1.5b, phi-4:14b, and qwen3-coder:30b
was effectively random.
This change covers tasks R-1, R-2, R-6 from the routing-defaults plan.
- nonChatModelPatterns + isNonChatModel substring matcher; matched
IDs are skipped during RegisterDiscoveredModels. Covers whisper,
moonshine, kokoros, vibevoice, -asr, -tts, -audio, -embedding,
embeddinggemma, -reranker, lfm2.
- knownVisionModelPrefixes gains gemma4, gemma-4, glm-ocr. gemma3
and minicpm-v entries stay for regression coverage.
- New internal/router/defaults.go with FamilyDefaults struct,
knownFamilyDefaults map, and ResolveFamilyDefaults longest-prefix
lookup (with org/-namespace stripping so reecdev/tiny3.5:1.5b
resolves to "tiny3.5"). Single entry for now: functiongemma is
registered with Disabled=true and MaxComplexity=0.40, reserved for
the future ArmRoleToolRouter path. Table will grow in R-3.
- RegisterDiscoveredModels consults ResolveFamilyDefaults and only
populates fields that are still zero on the arm, so user [[arms]]
overrides keep priority.
Plans:
- docs/superpowers/plans/2026-05-23-routing-defaults-refresh.md
- docs/superpowers/plans/2026-05-23-tool-router-specialization.md
TODO.md surfaces both as in-flight items.
Task gains a RequiresVision bool; filterFeasible enforces it on
both the primary feasibility pass and the last-resort fallback
(no degradation to a non-vision arm — the model literally cannot
consume image bytes).
Ollama discovery now probes /api/show for vision capability:
- details.families containing "clip" / "mllama" / "*vl"
- capabilities array containing "vision" (newer Ollama)
- name-prefix fallback for releases that predate either
(llava, qwen2.5-vl, llama3.2-vision, moondream, pixtral, etc.)
OllamaProbeResult replaces the map[string]bool tool cache so the
single /api/show call can populate tools + vision + ctx-size in
one probe. DiscoverOllama / DiscoverLocalModels signatures updated;
nil-cache callers in cmd/gnoma keep working unchanged.
RegisterDiscoveredModels propagates SupportsVision into the arm's
Capabilities.Vision.
Tests cover RequiresVision filtering in both the happy path
(vision-only arm chosen when image present) and the fallback path
(non-vision arm rejected even as last resort).
Apply gofmt -w across the codebase (struct field comment realignment
only — no semantic changes) and silence two errcheck warnings on
fmt.Sscanf / fmt.Fprintf return values in internal/router/discovery
with explicit `_, _ =` discards. Required so `make check` is green
before tagging v0.1.0.
3c87527 rewrote DiscoverLlamaCPP to hit /props and emit a single hardcoded
"default" entry. That breaks two cases:
1. Multi-model llama.cpp deployments (llama-swap, model-routing proxies)
are collapsed to a single arm with a placeholder ID.
2. Single-model deployments lose the real model name — arms are
registered as llamacpp/default instead of llamacpp/<actual-id>.
Restores enumeration via /v1/models (the OpenAI-compatible endpoint
llama-server exposes) while keeping the concrete n_ctx read from /props.
/props is now best-effort: failure or missing n_ctx falls back to the
documented default rather than aborting discovery.
Adds three tests: multi-model enumeration with shared context, /props
unreachable, and the empty-/v1/models error path.
3c87527 refactored DiscoverOllama and DiscoverLlamaCPP and dropped two
behaviors:
1. The Ollama toolCache prune loop. Without it, the cache grows
unbounded across reconcile cycles and stale entries linger; a
model that disappears and reappears replays an out-of-date
tool-support verdict because the cache hit skips re-probing.
2. Sensible context-size defaults. Both probes can yield
ContextSize=0 (Ollama: no num_ctx in /api/show parameters;
llama.cpp: /props default_generation_settings without n_ctx).
Registering an arm with ContextWindow=0 misroutes — the post-SLM
two-stage path treats it as a tiny model.
Restores the prune loop, applies 32768 (ollama) / 8192 (llama.cpp) as
fallbacks at discovery time, and adds three tests covering each path.
Brings the project to a clean `make lint` baseline (0 issues).
Mechanical:
- Wrap deferred resp.Body.Close() in closures (router/discovery.go,
router/probe.go) so the unchecked return surfaces as `_ = ...`.
- Apply `_ = ...` (single or multi-return blank) to test-file calls
that intentionally ignore errors: os.MkdirAll / os.WriteFile / os.Chdir
in setup paths, Close / Shutdown in teardown, Submit / Spawn / Send /
LoadDir in tests that assert on side effects.
Structural:
- engine.handleRequestTooLarge drops the unused req parameter and
rebuilds the request from compacted history (SA4009 — argument was
overwritten before first use).
- provider.ClassifyHTTPStatus and google.applyCapabilityOverrides switch
to tagged switches over the discriminator (QF1002).
- tui.app.go MouseWheel + inputMode and cmd/gnoma main slm-status use
tagged switches in place of equality chains (QF1003).
- cmd/gnoma main.go merges a var decl with its immediate assignment
(S1021).
- Three empty-branch sites (dispatcher_test, loader_test,
coordinator_test) become real assertions or get the dead `if` removed
(SA9003).
Three compounding bugs prevented tool calling with llama.cpp:
- Stream parser set argsComplete on partial JSON (e.g. "{"), dropping
subsequent argument deltas — fix: use json.Valid to detect completeness
- Missing tool_choice default — llama.cpp needs explicit "auto" to
activate its GBNF grammar constraint; now set when tools are present
- Tool names in history used internal format (fs.ls) while definitions
used API format (fs_ls) — now re-sanitized in translateMessage
Additional changes:
- Disable SDK retries for local providers (500s are deterministic)
- Dynamic capability probing via /props (llama.cpp) and /api/show
(Ollama), replacing hardcoded model prefix list
- Engine respects forced arm ToolUse capability when router is active
- Bundled /init skill with Go template blocks, context-aware for local
vs cloud models, deduplication rules against CLAUDE.md
- Tool result compaction for local models — previous round results
replaced with size markers to stay within small context windows
- Text-only fallback when tool-parse errors occur on local models
- "text-only" TUI indicator when model lacks tool support
- Session ResetError for retry after stream failures
- AllowedTools per-turn filtering in engine buildRequest
The discovery loop's reconcileArms removed the CLI-forced arm
(llamacpp/default) because the llama.cpp server reports the real model
name (e.g. gemma-26b), creating a mismatch. After 30s the forced arm
disappeared and all subsequent requests failed.
Three-layer fix:
- Eager: query the specific provider at startup to resolve the real
model name before registering the forced arm
- Lazy: reconcileArms detects placeholder "default" arm names and
atomically renames them when discovery reveals the real identity,
with an onReconcile callback to update the session and TUI
- Guard: the forced arm is never garbage-collected by the removal loop
Also fixes misleading /init error messaging — failed inits now show
"loaded from disk (init failed)" instead of "AGENTS.md written to".
provider/openai:
- Fix doubled tool call args (argsComplete flag): Ollama sends complete
args in the first streaming chunk then repeats them as delta, causing
doubled JSON and 400 errors in elfs
- Handle fs: prefix (gemma4 uses fs:grep instead of fs.grep)
- Add Reasoning field support for Ollama thinking output
cmd/gnoma:
- Early TTY detection so logger is created with correct destination
before any component gets a reference to it (fixes slog WARN bleed
into TUI textarea)
permission:
- Exempt spawn_elfs and agent tools from safety scanner: elf prompt
text may legitimately mention .env/.ssh/credentials patterns and
should not be blocked
tui/app:
- /init retry chain: no-tool-calls → spawn_elfs nudge → write nudge
(ask for plain text output) → TUI fallback write from streamBuf
- looksLikeAgentsMD + extractMarkdownDoc: validate and clean fallback
content before writing (reject refusals, strip narrative preambles)
- Collapse thinking output to 3 lines; ctrl+o to expand (live stream
and committed messages)
- Stream-level filter for model pseudo-tool-call blocks: suppresses
<<tool_code>>...</tool_code>> and <<function_call>>...<tool_call|>
from entering streamBuf across chunk boundaries
- sanitizeAssistantText regex covers both block formats
- Reset streamFilterClose at every turn start
At startup, polls ollama (/api/tags) and llama.cpp (/v1/models) for
available models. Registers each as an arm in the router alongside
the CLI-specified provider.
Discovered: 7 ollama models + 1 llama.cpp model = 9 total arms.
Router can now select from multiple local models based on task type.
Discovery is non-blocking — failures logged and skipped.