Two structural fixes for the SLM classifier's 100% failure rate:
(1) Pass ResponseFormat=json_object + Temperature=0 + TopP=1 +
MaxTokens=128 in the classifier Request. The provider type already
supports these but callSLM was leaving them unset, which meant ollama
(and any other backend) ran with default sampling and free-form text
output. format=json mode in particular makes ollama emit only valid
JSON at decoding time — eliminates the majority of parse failures.
(2) Harden extractJSON to strip common thinking-block tags before
hunting for the brace. Seen in the wild: <think>…</think> (Qwen3
distillations) and <Thought Process>…</Thought Process> (tiny3.5).
Defensive list also covers <reasoning>, <thoughts>. Unterminated
thinking blocks fall back to brace-search so we still have a shot.
Table-driven tests cover all variants plus the no-tag and
fenced-json paths to confirm no regression.
Even with format=json on a capable provider, the extractor is the
safety net for backends that don't enforce format strictly — same
defence-in-depth shape as the existing fence stripping.
Doesn't fix the deeper architecture question (encoder + bandit
preferred over decoder-SLM as classifier — see plan doc landing in
the same PR); fixes the immediate bug.
Five fixes folded into one commit because they all answer the same
question: 'why does my router stats output lie to me?'
Issue 1 (timeout). Default classify timeout was 5s — too short for
cold-start ollama loads on small models. Bumped to 15s and surfaced
as [slm].classify_timeout (0 = built-in default). Empirically caught
when a user's reecdev/tiny3.5:1.5b hit 'stream error: context
deadline exceeded' on every single classify call.
Issue 2 (Warn-level error). The SLM-fallback path logged the
underlying error at Debug, invisible without --verbose. Promoted to
Warn so a first-time misconfiguration surfaces immediately. The
fallback itself is benign; the signal is that the SLM isn't doing
the work it was supposed to.
Issue 3 (stats hint). Hard-coded 'check that llamafile boots' even
when the user is on ollama. Replaced with backend-templated advice
read from cfg.SLM.Backend. Also distinguishes three diagnostic
cases that were collapsed before:
- SLM never called (zero attempts)
- SLM called N times but every call fell back (timeout/parse)
- SLM working but minority share
Issue 4 (effective heuristic share). The classifier breakdown
shows 'heuristic' and 'slm_fallback' as separate sources, but both
routed through HeuristicClassifier — only the source tag differs.
New line under 'total observations' surfaces the combined share
honestly: 'effective heuristic share: 100% (44 fallbacks + 10
pure heuristic)'.
Issue 5 (config schema). [slm].classify_timeout joins the existing
[slm] knobs alongside startup_timeout. Documented inline with the
cold-start-load rationale.
Apply gofmt -w across the codebase (struct field comment realignment
only — no semantic changes) and silence two errcheck warnings on
fmt.Sscanf / fmt.Fprintf return values in internal/router/discovery
with explicit `_, _ =` discards. Required so `make check` is green
before tagging v0.1.0.
The router.SecureProvider interface previously required a public
IsSecure() bool method. Any test mock — or future production type —
could satisfy it by returning true, defeating the W1 "only wrapped
providers may flow past the boundary" contract through convention
rather than at the type level.
Replaces IsSecure() bool with an unexported security.Marker interface
that has a single secured() method. Go's method-set semantics key
unexported methods by their defining package, so only types declared in
internal/security can satisfy Marker. *SafeProvider gets the lone
secured() implementation; router.SecureProvider embeds Marker.
The seal forces every test mock that previously implemented IsSecure()
to either (a) be wrapped with security.WrapProvider(mp, nil) at the use
site, or (b) drop the method entirely if the mock never flows through
SecureProvider. 93 use sites across 11 test files were updated via a
per-package secureMock helper. WrapProvider with a nil firewall ref is
a no-op pass-through, so test behavior is unchanged.
Empirically: a type from outside internal/security can declare
`secured()` but the compiler will reject assigning it to
router.SecureProvider because the unexported method belongs to the
other package's namespace. Convention → compile-time guarantee.
The Debug floor (0.4) added in eb0583f was bumping the SLM-returned
0.25 up, breaking the HappyPath assertion. Bump the SLM value to 0.55
so the test still verifies "SLM value preserved" (its original
intent), and add a dedicated TestClassifier_AppliesTaskTypeFloor that
exercises the under-reporting case the floor was added to handle.
Phase 4 routing decisions depend on knowing whether the SLM classifier
is actually firing or whether the heuristic is silently doing all the
work. Adds the instrumentation to make that observable.
router.ClassifierSource enum (heuristic / slm / slm_fallback) is set
on Task by every classifier:
- HeuristicClassifier → ClassifierHeuristic
- slm.Classifier → ClassifierSLM on success, ClassifierSLMFallback when
the SLM call fails or returns unparseable output
The source is plumbed through router.Outcome to QualityTracker, which
now maintains per-source counters alongside the existing per-arm × task
EMA scores. QualitySnapshot serializes both (classifier_counts is
omitempty for back-compat with pre-feature quality.json files).
lazyClassifier logs at INFO the first time it falls back to heuristic
because the SLM hasn't booted yet — distinguishes operational fallback
from an unconfigured-SLM run.
slm.Manager.Start() now records elapsed-to-healthy and the main.go
goroutine logs it as part of the "SLM ready" event. Confirms whether
short-lived runs are racing the boot cycle.
New `gnoma router stats` subcommand prints both tables (arm × task
quality, classifier source breakdown) from quality.json with a Phase 4
trust hint when the data is too sparse or the SLM share is low.
6 new tests cover ClassifierSource string/enum, heuristic + SLM source
propagation, QualityTracker counter round-trip, and back-compat
restore from a legacy quality.json without classifier_counts.
- slm.Classifier: openaicompat → llamafile, 2s timeout + heuristic fallback,
heuristic baseline blended so Priority/RequiredEffort are never zeroed,
extractJSON strips markdown fences from small-model responses
- router.ParseTaskType: case-insensitive string → TaskType, unknown → TaskGeneration
- router.Arm.MaxComplexity: zero = no ceiling (preserves existing arm behavior);
filterFeasible excludes arms when task.ComplexityScore > MaxComplexity
- config.SLMSection: [slm] enabled / model_url / data_dir
- openaicompat.NewLlamafile: no API key, model = "default", no retries
- slm.Manager: DefaultDataDir() (XDG), Manifest() accessor
- cmd/gnoma: `gnoma slm setup` / `gnoma slm status` subcommands; SLM arm
registered with MaxComplexity=0.3 when enabled + set up
- tui: /config shows slm status (ready/missing/not set up + base URL if running)
- docs: roadmap updated to reflect llamafile pivot from Ollama