7213a1e2fd
Release / release (push) Has been cancelled
Empirical comparison on 2026-05-25 across three candidate SLMs on
identical prompts (two prompts: trivial 'what is 2+2' + knowledge
'explain a multi-armed bandit'):
qwen3:0.6b consistent across both prompts
functiongemma:270m works trivial, derails on knowledge prompts
gemma3:1b unusable (emits just '{' or invented keys)
reecdev/tiny3.5:1.5b unusable (ignores /no_think, leaks <Thought Process> blocks)
qwen2.5-coder:1.5b unusable (ignores classifier prompt, answers in prose)
qwen3:0.6b honours Qwen3's native /no_think flag (the distillation in
the old default did not), is smaller than the previous recommendation
(520 MB vs 1 GB), and was the only candidate to classify both test
prompts successfully without falling back to heuristic.
README quickstart block + slm-backends.md presets + status output
sample all switched. Also documents register_as_arm (default true,
set false for task-specialised models like FunctionGemma) and
classify_timeout (default 15s) in the example configs since both
landed in v0.3.3+.
Code defaults for the tiny3.5 family in internal/router/defaults.go
are unchanged — that table still applies when users have tiny3.5
registered as a routing arm independent of the SLM role.