Files
vikingowl a14fe8b504 feat(slm): pluggable backends + trivial-prompt routing
The SLM had two intended jobs — classify every prompt and execute the
small ones itself — but in practice three independent gates kept it
out of nearly all real work:

  1. llamafile cold-start blocked pipe-mode runs (always faster than
     the 15 s health check)
  2. ClassifyTask defaulted RequiresTools=true, excluding the SLM arm
     (ToolUse=false) from 9/10 task types
  3. armTier hard-coded CLI agents > local > API, so even when the SLM
     arm was feasible a CLI agent won

Each gate is addressed below. The result is an SLM that actually does
its job — small stuff stays local, complex stuff routes up — gated by
arm capability rather than by accidents of the boot order.

Backend layer (the bigger change)

The original implementation hard-coded llamafile. That's fine if you
have nothing else, but most users with a local model setup already run
Ollama or llama.cpp. The new factory at internal/slm/backend.go picks
between:

  - ollama (any local Ollama daemon)
  - llamacpp (any llama.cpp server)
  - llamafile (gnoma-managed, current behaviour)
  - openaicompat (LM Studio, vLLM, remote API)
  - auto (probes in order, picks first reachable)
  - disabled

[slm].backend in config.toml selects which. Documented in
docs/slm-backends.md with copy-paste presets for each. The factory
probes the underlying model's actual capabilities (Ollama /api/show,
llama.cpp /props) and sets the SLM arm's ToolUse accordingly — so the
arm picks up simple file-read style tasks on tool-capable models and
stays knowledge-only on completion-only models.

Trivial-prompt heuristic (Gate 2)

ClassifyTask now flips RequiresTools=false for short, low-complexity
prompts whose task type doesn't imply existing code (Explain,
Generation, Boilerplate). Tool-needing tokens (read, write, run, test,
file, …) keep RequiresTools=true even when the prompt is brief.

Complexity-aware tier ordering (Gate 3)

armTier takes a Task and returns tier 0 for arms whose MaxComplexity
ceiling fits the task. CLI agents drop to tier 1, local to 2, API to 3.
For trivial tasks the SLM arm wins; for complex tasks the SLM falls
out of the feasible set (MaxComplexity exclusion) and the original
ordering reasserts.

Eager boot with user-facing wait (Gate 1)

Removed the original goroutine-only path. SLM startup now blocks
synchronously inside the factory; for llamafile that means up to
[slm].startup_timeout (default 5 s) of waiting on the first
invocation, with "Starting SLM…" → "SLM ready (backend, model, tools,
boot=N)" / "SLM unavailable: …" messages on stderr. Ollama / llamacpp
backends boot instantly because the daemon is already running.

waitHealthy() now respects the caller's context deadline instead of
its old hardcoded 15 s ceiling.

Classifier reliability

Classifier timeout bumped 2 s → 5 s for thinking-mode models like
Qwen3-distilled Tiny3.5. System prompt includes /no_think directive
for the same family. These help but don't eliminate small-model
JSON-contract failures — see the docs section on picking a model.

Probe + telemetry surfaces

gnoma slm status now prints the configured backend + model + a live
probe result (✓/✗) instead of just the llamafile manifest state.

`gnoma router stats` already (from the previous commit) shows the
classifier-source mix; with this change you can finally see slm /
slm_fallback / heuristic share rise from "always heuristic" to
something reflecting real SLM activity.

Tests

  - 9 new backend-factory tests (httptest-backed Ollama probe, error
    paths, auto-detection, capability flags)
  - Tier-ordering tests cover the new "specialised small arm wins
    trivial task" path
  - Trivial-prompt heuristic tested for both halves (knowledge-only
    flips RequiresTools=false; debug/file/run keeps it true)

Deletes the dead SLMManager field from the TUI Config — it was
declared but never read.
2026-05-19 18:53:32 +02:00

370 lines
11 KiB
Go

package slm
import (
"bytes"
"context"
"encoding/json"
"fmt"
"log/slog"
"net/http"
"sort"
"strings"
"time"
"somegit.dev/Owlibou/gnoma/internal/provider"
"somegit.dev/Owlibou/gnoma/internal/provider/openaicompat"
)
// Backend identifies an SLM execution backend.
type Backend string
const (
BackendAuto Backend = "auto"
BackendOllama Backend = "ollama"
BackendLlamaCpp Backend = "llamacpp"
BackendLlamafile Backend = "llamafile"
BackendOpenAICompat Backend = "openaicompat"
BackendDisabled Backend = "disabled"
)
// BackendConfig is the subset of config.SLMSection that StartBackend needs.
// Decoupled from the config package so the slm package can be imported from
// anywhere without a dependency cycle.
type BackendConfig struct {
Backend Backend
Model string
BaseURL string
ModelURL string
DataDir string
StartupTimeout time.Duration
// ToolSupport overrides auto-detection for backends we can't probe
// generically (openaicompat). Ignored when auto-detection succeeds.
ToolSupport bool
}
// Boot is a started SLM backend, ready to act as a provider.Provider for the
// classifier and as a router arm. Close is always non-nil; for stateless
// backends (Ollama, llamacpp, openaicompat) it's a no-op.
type Boot struct {
Backend Backend
Provider provider.Provider
Model string
BootTime time.Duration
ToolSupport bool // true when the underlying model is known to handle tool calls
Close func() error
}
// StartBackend dispatches by cfg.Backend and returns a started SLM. Returns
// (nil, nil) when the chosen backend is "disabled" or when "auto" found no
// available backend — callers stay on the heuristic classifier silently.
// Returns a non-nil error only when the configuration itself is broken
// (unknown backend, missing required field for an explicit choice).
func StartBackend(ctx context.Context, cfg BackendConfig, logger *slog.Logger) (*Boot, error) {
if logger == nil {
logger = slog.Default()
}
backend := cfg.Backend
if backend == "" {
backend = BackendAuto
}
switch backend {
case BackendDisabled:
return nil, nil
case BackendOllama:
return startOllama(ctx, cfg, logger)
case BackendLlamaCpp:
return startLlamaCpp(ctx, cfg, logger)
case BackendLlamafile:
return startLlamafile(ctx, cfg, logger)
case BackendOpenAICompat:
return startOpenAICompat(ctx, cfg, logger)
case BackendAuto:
return autoStart(ctx, cfg, logger)
default:
return nil, fmt.Errorf("slm: unknown backend %q", backend)
}
}
// ---- Backend implementations --------------------------------------------
const (
ollamaDefaultURL = "http://localhost:11434"
llamacppDefaultURL = "http://localhost:8080"
)
func startOllama(_ context.Context, cfg BackendConfig, logger *slog.Logger) (*Boot, error) {
baseURL := strings.TrimRight(cfg.BaseURL, "/")
if baseURL == "" {
baseURL = ollamaDefaultURL
}
model := cfg.Model
if model == "" {
// Try to pick a sensible default model.
picked, ok := pickSmallestOllamaModel(baseURL)
if !ok {
return nil, fmt.Errorf("slm: ollama backend requires [slm] model, and no models were reachable at %s", baseURL)
}
model = picked
logger.Info("slm: auto-picked Ollama model", "model", model, "base_url", baseURL)
}
apiURL := baseURL + "/v1"
begin := time.Now()
prov, err := openaicompat.NewOllama(provider.ProviderConfig{BaseURL: apiURL})
if err != nil {
return nil, fmt.Errorf("slm: ollama provider: %w", err)
}
return &Boot{
Backend: BackendOllama,
Provider: prov,
Model: model,
BootTime: time.Since(begin),
ToolSupport: probeOllamaToolSupport(baseURL, model),
Close: func() error { return nil },
}, nil
}
func startLlamaCpp(_ context.Context, cfg BackendConfig, logger *slog.Logger) (*Boot, error) {
baseURL := strings.TrimRight(cfg.BaseURL, "/")
if baseURL == "" {
baseURL = llamacppDefaultURL
}
model := cfg.Model
if model == "" {
model = "default" // llama.cpp server ignores the model field
}
apiURL := baseURL + "/v1"
begin := time.Now()
prov, err := openaicompat.NewLlamaCpp(provider.ProviderConfig{BaseURL: apiURL})
if err != nil {
return nil, fmt.Errorf("slm: llamacpp provider: %w", err)
}
logger.Info("slm: using llama.cpp backend", "base_url", baseURL, "model", model)
return &Boot{
Backend: BackendLlamaCpp,
Provider: prov,
Model: model,
BootTime: time.Since(begin),
ToolSupport: probeLlamacppToolSupport(baseURL),
Close: func() error { return nil },
}, nil
}
func startOpenAICompat(_ context.Context, cfg BackendConfig, logger *slog.Logger) (*Boot, error) {
baseURL := strings.TrimRight(cfg.BaseURL, "/")
if baseURL == "" {
return nil, fmt.Errorf("slm: openaicompat backend requires [slm] base_url")
}
model := cfg.Model
if model == "" {
return nil, fmt.Errorf("slm: openaicompat backend requires [slm] model")
}
begin := time.Now()
prov, err := openaicompat.NewLlamafile(provider.ProviderConfig{BaseURL: baseURL})
if err != nil {
return nil, fmt.Errorf("slm: openaicompat provider: %w", err)
}
logger.Info("slm: using openai-compatible backend", "base_url", baseURL, "model", model)
return &Boot{
Backend: BackendOpenAICompat,
Provider: prov,
Model: model,
BootTime: time.Since(begin),
ToolSupport: cfg.ToolSupport, // user-asserted; no generic probe
Close: func() error { return nil },
}, nil
}
func startLlamafile(ctx context.Context, cfg BackendConfig, logger *slog.Logger) (*Boot, error) {
dataDir := cfg.DataDir
if dataDir == "" {
dataDir = DefaultDataDir()
}
mgr := New(Config{DataDir: dataDir, ModelURL: cfg.ModelURL}, logger)
if !mgr.IsSetUp() {
return nil, fmt.Errorf("slm: llamafile not set up; run: gnoma slm setup")
}
timeout := cfg.StartupTimeout
if timeout <= 0 {
timeout = 5 * time.Second
}
bootCtx, cancel := context.WithTimeout(ctx, timeout)
defer cancel()
baseURL, err := mgr.Start(bootCtx)
if err != nil {
return nil, fmt.Errorf("slm: start llamafile: %w", err)
}
prov, err := openaicompat.NewLlamafile(provider.ProviderConfig{BaseURL: baseURL + "/v1"})
if err != nil {
_ = mgr.Stop()
return nil, fmt.Errorf("slm: llamafile provider: %w", err)
}
return &Boot{
Backend: BackendLlamafile,
Provider: prov,
Model: "default",
BootTime: mgr.StartupDuration(),
ToolSupport: probeLlamacppToolSupport(baseURL), // llamafile speaks the llama.cpp server protocol
Close: func() error { return mgr.Stop() },
}, nil
}
// autoStart picks the first available backend in priority order:
//
// 1. Explicit llamafile (ModelURL or DataDir is set, AND the manifest is on
// disk) — respects users who already ran `gnoma slm setup`.
// 2. Ollama, if reachable with at least one model.
// 3. llama.cpp, if reachable.
// 4. llamafile, if a manifest happens to exist anywhere.
// 5. Nothing → returns (nil, nil); caller stays on heuristic classifier.
func autoStart(ctx context.Context, cfg BackendConfig, logger *slog.Logger) (*Boot, error) {
// Hint: if the user has llamafile config set, prefer it.
if cfg.ModelURL != "" {
mgr := New(Config{
DataDir: defaultIfEmpty(cfg.DataDir, DefaultDataDir()),
ModelURL: cfg.ModelURL,
}, logger)
if mgr.IsSetUp() {
return startLlamafile(ctx, cfg, logger)
}
}
if model, ok := pickSmallestOllamaModel(ollamaDefaultURL); ok {
c := cfg
c.Model = model
boot, err := startOllama(ctx, c, logger)
if err == nil {
return boot, nil
}
logger.Debug("slm auto: ollama probe found models but provider init failed", "error", err)
}
if llamacppReachable(llamacppDefaultURL) {
boot, err := startLlamaCpp(ctx, cfg, logger)
if err == nil {
return boot, nil
}
}
mgr := New(Config{DataDir: DefaultDataDir(), ModelURL: cfg.ModelURL}, logger)
if mgr.IsSetUp() {
return startLlamafile(ctx, cfg, logger)
}
logger.Info("slm auto: no backend reachable; staying on heuristic classifier")
return nil, nil
}
// ---- Discovery helpers --------------------------------------------------
// pickSmallestOllamaModel returns the model with the smallest reported size
// from the Ollama /api/tags endpoint. Returns ("", false) when Ollama is not
// reachable or has no models.
func pickSmallestOllamaModel(baseURL string) (string, bool) {
client := &http.Client{Timeout: 1500 * time.Millisecond}
resp, err := client.Get(baseURL + "/api/tags")
if err != nil {
return "", false
}
defer func() { _ = resp.Body.Close() }()
if resp.StatusCode != http.StatusOK {
return "", false
}
var body struct {
Models []struct {
Name string `json:"name"`
Size int64 `json:"size"`
} `json:"models"`
}
if err := json.NewDecoder(resp.Body).Decode(&body); err != nil {
return "", false
}
if len(body.Models) == 0 {
return "", false
}
sort.Slice(body.Models, func(i, j int) bool {
return body.Models[i].Size < body.Models[j].Size
})
return body.Models[0].Name, true
}
func llamacppReachable(baseURL string) bool {
client := &http.Client{Timeout: 750 * time.Millisecond}
resp, err := client.Get(baseURL + "/props")
if err != nil {
return false
}
defer func() { _ = resp.Body.Close() }()
return resp.StatusCode == http.StatusOK
}
// probeOllamaToolSupport asks Ollama's /api/show whether a given model
// advertises the "tools" capability. Returns false on any error or when
// the capability is missing — conservative: assume no tools when unsure.
func probeOllamaToolSupport(baseURL, model string) bool {
body, err := json.Marshal(map[string]string{"model": model})
if err != nil {
return false
}
req, err := http.NewRequest(http.MethodPost, baseURL+"/api/show", bytes.NewReader(body))
if err != nil {
return false
}
req.Header.Set("Content-Type", "application/json")
client := &http.Client{Timeout: 2 * time.Second}
resp, err := client.Do(req)
if err != nil {
return false
}
defer func() { _ = resp.Body.Close() }()
if resp.StatusCode != http.StatusOK {
return false
}
var r struct {
Capabilities []string `json:"capabilities"`
}
if err := json.NewDecoder(resp.Body).Decode(&r); err != nil {
return false
}
for _, c := range r.Capabilities {
if c == "tools" {
return true
}
}
return false
}
// probeLlamacppToolSupport asks llama.cpp's /props endpoint whether the
// chat template advertises tool support. Same convention as Ollama: assume
// no tools when the probe fails.
func probeLlamacppToolSupport(baseURL string) bool {
client := &http.Client{Timeout: 1500 * time.Millisecond}
resp, err := client.Get(baseURL + "/props")
if err != nil {
return false
}
defer func() { _ = resp.Body.Close() }()
if resp.StatusCode != http.StatusOK {
return false
}
var r struct {
ChatTemplateCaps struct {
SupportsTools bool `json:"supports_tools"`
SupportsToolCalls bool `json:"supports_tool_calls"`
} `json:"chat_template_caps"`
}
if err := json.NewDecoder(resp.Body).Decode(&r); err != nil {
return false
}
return r.ChatTemplateCaps.SupportsTools && r.ChatTemplateCaps.SupportsToolCalls
}
func defaultIfEmpty(s, fallback string) string {
if s == "" {
return fallback
}
return s
}