Discovery previously registered every model returned by Ollama as a
chat arm, including embeddings, ASR, TTS, audio realtime, and
rerankers — which then failed at inference time when the router
selected them. Local arms also shipped with all-zero defaults, so
selection between e.g. tiny3.5:1.5b, phi-4:14b, and qwen3-coder:30b
was effectively random.
This change covers tasks R-1, R-2, R-6 from the routing-defaults plan.
- nonChatModelPatterns + isNonChatModel substring matcher; matched
IDs are skipped during RegisterDiscoveredModels. Covers whisper,
moonshine, kokoros, vibevoice, -asr, -tts, -audio, -embedding,
embeddinggemma, -reranker, lfm2.
- knownVisionModelPrefixes gains gemma4, gemma-4, glm-ocr. gemma3
and minicpm-v entries stay for regression coverage.
- New internal/router/defaults.go with FamilyDefaults struct,
knownFamilyDefaults map, and ResolveFamilyDefaults longest-prefix
lookup (with org/-namespace stripping so reecdev/tiny3.5:1.5b
resolves to "tiny3.5"). Single entry for now: functiongemma is
registered with Disabled=true and MaxComplexity=0.40, reserved for
the future ArmRoleToolRouter path. Table will grow in R-3.
- RegisterDiscoveredModels consults ResolveFamilyDefaults and only
populates fields that are still zero on the arm, so user [[arms]]
overrides keep priority.
Plans:
- docs/superpowers/plans/2026-05-23-routing-defaults-refresh.md
- docs/superpowers/plans/2026-05-23-tool-router-specialization.md
TODO.md surfaces both as in-flight items.
Task gains a RequiresVision bool; filterFeasible enforces it on
both the primary feasibility pass and the last-resort fallback
(no degradation to a non-vision arm — the model literally cannot
consume image bytes).
Ollama discovery now probes /api/show for vision capability:
- details.families containing "clip" / "mllama" / "*vl"
- capabilities array containing "vision" (newer Ollama)
- name-prefix fallback for releases that predate either
(llava, qwen2.5-vl, llama3.2-vision, moondream, pixtral, etc.)
OllamaProbeResult replaces the map[string]bool tool cache so the
single /api/show call can populate tools + vision + ctx-size in
one probe. DiscoverOllama / DiscoverLocalModels signatures updated;
nil-cache callers in cmd/gnoma keep working unchanged.
RegisterDiscoveredModels propagates SupportsVision into the arm's
Capabilities.Vision.
Tests cover RequiresVision filtering in both the happy path
(vision-only arm chosen when image present) and the fallback path
(non-vision arm rejected even as last resort).
Apply gofmt -w across the codebase (struct field comment realignment
only — no semantic changes) and silence two errcheck warnings on
fmt.Sscanf / fmt.Fprintf return values in internal/router/discovery
with explicit `_, _ =` discards. Required so `make check` is green
before tagging v0.1.0.
The router.SecureProvider interface previously required a public
IsSecure() bool method. Any test mock — or future production type —
could satisfy it by returning true, defeating the W1 "only wrapped
providers may flow past the boundary" contract through convention
rather than at the type level.
Replaces IsSecure() bool with an unexported security.Marker interface
that has a single secured() method. Go's method-set semantics key
unexported methods by their defining package, so only types declared in
internal/security can satisfy Marker. *SafeProvider gets the lone
secured() implementation; router.SecureProvider embeds Marker.
The seal forces every test mock that previously implemented IsSecure()
to either (a) be wrapped with security.WrapProvider(mp, nil) at the use
site, or (b) drop the method entirely if the mock never flows through
SecureProvider. 93 use sites across 11 test files were updated via a
per-package secureMock helper. WrapProvider with a nil firewall ref is
a no-op pass-through, so test behavior is unchanged.
Empirically: a type from outside internal/security can declare
`secured()` but the compiler will reject assigning it to
router.SecureProvider because the unexported method belongs to the
other package's namespace. Convention → compile-time guarantee.
3c87527 rewrote DiscoverLlamaCPP to hit /props and emit a single hardcoded
"default" entry. That breaks two cases:
1. Multi-model llama.cpp deployments (llama-swap, model-routing proxies)
are collapsed to a single arm with a placeholder ID.
2. Single-model deployments lose the real model name — arms are
registered as llamacpp/default instead of llamacpp/<actual-id>.
Restores enumeration via /v1/models (the OpenAI-compatible endpoint
llama-server exposes) while keeping the concrete n_ctx read from /props.
/props is now best-effort: failure or missing n_ctx falls back to the
documented default rather than aborting discovery.
Adds three tests: multi-model enumeration with shared context, /props
unreachable, and the empty-/v1/models error path.
3c87527 refactored DiscoverOllama and DiscoverLlamaCPP and dropped two
behaviors:
1. The Ollama toolCache prune loop. Without it, the cache grows
unbounded across reconcile cycles and stale entries linger; a
model that disappears and reappears replays an out-of-date
tool-support verdict because the cache hit skips re-probing.
2. Sensible context-size defaults. Both probes can yield
ContextSize=0 (Ollama: no num_ctx in /api/show parameters;
llama.cpp: /props default_generation_settings without n_ctx).
Registering an arm with ContextWindow=0 misroutes — the post-SLM
two-stage path treats it as a tiny model.
Restores the prune loop, applies 32768 (ollama) / 8192 (llama.cpp) as
fallbacks at discovery time, and adds three tests covering each path.
The discovery loop's reconcileArms removed the CLI-forced arm
(llamacpp/default) because the llama.cpp server reports the real model
name (e.g. gemma-26b), creating a mismatch. After 30s the forced arm
disappeared and all subsequent requests failed.
Three-layer fix:
- Eager: query the specific provider at startup to resolve the real
model name before registering the forced arm
- Lazy: reconcileArms detects placeholder "default" arm names and
atomically renames them when discovery reveals the real identity,
with an onReconcile callback to update the session and TUI
- Guard: the forced arm is never garbage-collected by the removal loop
Also fixes misleading /init error messaging — failed inits now show
"loaded from disk (init failed)" instead of "AGENTS.md written to".