a14fe8b504
The SLM had two intended jobs — classify every prompt and execute the
small ones itself — but in practice three independent gates kept it
out of nearly all real work:
1. llamafile cold-start blocked pipe-mode runs (always faster than
the 15 s health check)
2. ClassifyTask defaulted RequiresTools=true, excluding the SLM arm
(ToolUse=false) from 9/10 task types
3. armTier hard-coded CLI agents > local > API, so even when the SLM
arm was feasible a CLI agent won
Each gate is addressed below. The result is an SLM that actually does
its job — small stuff stays local, complex stuff routes up — gated by
arm capability rather than by accidents of the boot order.
Backend layer (the bigger change)
The original implementation hard-coded llamafile. That's fine if you
have nothing else, but most users with a local model setup already run
Ollama or llama.cpp. The new factory at internal/slm/backend.go picks
between:
- ollama (any local Ollama daemon)
- llamacpp (any llama.cpp server)
- llamafile (gnoma-managed, current behaviour)
- openaicompat (LM Studio, vLLM, remote API)
- auto (probes in order, picks first reachable)
- disabled
[slm].backend in config.toml selects which. Documented in
docs/slm-backends.md with copy-paste presets for each. The factory
probes the underlying model's actual capabilities (Ollama /api/show,
llama.cpp /props) and sets the SLM arm's ToolUse accordingly — so the
arm picks up simple file-read style tasks on tool-capable models and
stays knowledge-only on completion-only models.
Trivial-prompt heuristic (Gate 2)
ClassifyTask now flips RequiresTools=false for short, low-complexity
prompts whose task type doesn't imply existing code (Explain,
Generation, Boilerplate). Tool-needing tokens (read, write, run, test,
file, …) keep RequiresTools=true even when the prompt is brief.
Complexity-aware tier ordering (Gate 3)
armTier takes a Task and returns tier 0 for arms whose MaxComplexity
ceiling fits the task. CLI agents drop to tier 1, local to 2, API to 3.
For trivial tasks the SLM arm wins; for complex tasks the SLM falls
out of the feasible set (MaxComplexity exclusion) and the original
ordering reasserts.
Eager boot with user-facing wait (Gate 1)
Removed the original goroutine-only path. SLM startup now blocks
synchronously inside the factory; for llamafile that means up to
[slm].startup_timeout (default 5 s) of waiting on the first
invocation, with "Starting SLM…" → "SLM ready (backend, model, tools,
boot=N)" / "SLM unavailable: …" messages on stderr. Ollama / llamacpp
backends boot instantly because the daemon is already running.
waitHealthy() now respects the caller's context deadline instead of
its old hardcoded 15 s ceiling.
Classifier reliability
Classifier timeout bumped 2 s → 5 s for thinking-mode models like
Qwen3-distilled Tiny3.5. System prompt includes /no_think directive
for the same family. These help but don't eliminate small-model
JSON-contract failures — see the docs section on picking a model.
Probe + telemetry surfaces
gnoma slm status now prints the configured backend + model + a live
probe result (✓/✗) instead of just the llamafile manifest state.
`gnoma router stats` already (from the previous commit) shows the
classifier-source mix; with this change you can finally see slm /
slm_fallback / heuristic share rise from "always heuristic" to
something reflecting real SLM activity.
Tests
- 9 new backend-factory tests (httptest-backed Ollama probe, error
paths, auto-detection, capability flags)
- Tier-ordering tests cover the new "specialised small arm wins
trivial task" path
- Trivial-prompt heuristic tested for both halves (knowledge-only
flips RequiresTools=false; debug/file/run keeps it true)
Deletes the dead SLMManager field from the TUI Config — it was
declared but never read.
370 lines
11 KiB
Go
370 lines
11 KiB
Go
package slm
|
|
|
|
import (
|
|
"bytes"
|
|
"context"
|
|
"encoding/json"
|
|
"fmt"
|
|
"log/slog"
|
|
"net/http"
|
|
"sort"
|
|
"strings"
|
|
"time"
|
|
|
|
"somegit.dev/Owlibou/gnoma/internal/provider"
|
|
"somegit.dev/Owlibou/gnoma/internal/provider/openaicompat"
|
|
)
|
|
|
|
// Backend identifies an SLM execution backend.
|
|
type Backend string
|
|
|
|
const (
|
|
BackendAuto Backend = "auto"
|
|
BackendOllama Backend = "ollama"
|
|
BackendLlamaCpp Backend = "llamacpp"
|
|
BackendLlamafile Backend = "llamafile"
|
|
BackendOpenAICompat Backend = "openaicompat"
|
|
BackendDisabled Backend = "disabled"
|
|
)
|
|
|
|
// BackendConfig is the subset of config.SLMSection that StartBackend needs.
|
|
// Decoupled from the config package so the slm package can be imported from
|
|
// anywhere without a dependency cycle.
|
|
type BackendConfig struct {
|
|
Backend Backend
|
|
Model string
|
|
BaseURL string
|
|
ModelURL string
|
|
DataDir string
|
|
StartupTimeout time.Duration
|
|
// ToolSupport overrides auto-detection for backends we can't probe
|
|
// generically (openaicompat). Ignored when auto-detection succeeds.
|
|
ToolSupport bool
|
|
}
|
|
|
|
// Boot is a started SLM backend, ready to act as a provider.Provider for the
|
|
// classifier and as a router arm. Close is always non-nil; for stateless
|
|
// backends (Ollama, llamacpp, openaicompat) it's a no-op.
|
|
type Boot struct {
|
|
Backend Backend
|
|
Provider provider.Provider
|
|
Model string
|
|
BootTime time.Duration
|
|
ToolSupport bool // true when the underlying model is known to handle tool calls
|
|
Close func() error
|
|
}
|
|
|
|
// StartBackend dispatches by cfg.Backend and returns a started SLM. Returns
|
|
// (nil, nil) when the chosen backend is "disabled" or when "auto" found no
|
|
// available backend — callers stay on the heuristic classifier silently.
|
|
// Returns a non-nil error only when the configuration itself is broken
|
|
// (unknown backend, missing required field for an explicit choice).
|
|
func StartBackend(ctx context.Context, cfg BackendConfig, logger *slog.Logger) (*Boot, error) {
|
|
if logger == nil {
|
|
logger = slog.Default()
|
|
}
|
|
|
|
backend := cfg.Backend
|
|
if backend == "" {
|
|
backend = BackendAuto
|
|
}
|
|
|
|
switch backend {
|
|
case BackendDisabled:
|
|
return nil, nil
|
|
case BackendOllama:
|
|
return startOllama(ctx, cfg, logger)
|
|
case BackendLlamaCpp:
|
|
return startLlamaCpp(ctx, cfg, logger)
|
|
case BackendLlamafile:
|
|
return startLlamafile(ctx, cfg, logger)
|
|
case BackendOpenAICompat:
|
|
return startOpenAICompat(ctx, cfg, logger)
|
|
case BackendAuto:
|
|
return autoStart(ctx, cfg, logger)
|
|
default:
|
|
return nil, fmt.Errorf("slm: unknown backend %q", backend)
|
|
}
|
|
}
|
|
|
|
// ---- Backend implementations --------------------------------------------
|
|
|
|
const (
|
|
ollamaDefaultURL = "http://localhost:11434"
|
|
llamacppDefaultURL = "http://localhost:8080"
|
|
)
|
|
|
|
func startOllama(_ context.Context, cfg BackendConfig, logger *slog.Logger) (*Boot, error) {
|
|
baseURL := strings.TrimRight(cfg.BaseURL, "/")
|
|
if baseURL == "" {
|
|
baseURL = ollamaDefaultURL
|
|
}
|
|
model := cfg.Model
|
|
if model == "" {
|
|
// Try to pick a sensible default model.
|
|
picked, ok := pickSmallestOllamaModel(baseURL)
|
|
if !ok {
|
|
return nil, fmt.Errorf("slm: ollama backend requires [slm] model, and no models were reachable at %s", baseURL)
|
|
}
|
|
model = picked
|
|
logger.Info("slm: auto-picked Ollama model", "model", model, "base_url", baseURL)
|
|
}
|
|
apiURL := baseURL + "/v1"
|
|
begin := time.Now()
|
|
prov, err := openaicompat.NewOllama(provider.ProviderConfig{BaseURL: apiURL})
|
|
if err != nil {
|
|
return nil, fmt.Errorf("slm: ollama provider: %w", err)
|
|
}
|
|
return &Boot{
|
|
Backend: BackendOllama,
|
|
Provider: prov,
|
|
Model: model,
|
|
BootTime: time.Since(begin),
|
|
ToolSupport: probeOllamaToolSupport(baseURL, model),
|
|
Close: func() error { return nil },
|
|
}, nil
|
|
}
|
|
|
|
func startLlamaCpp(_ context.Context, cfg BackendConfig, logger *slog.Logger) (*Boot, error) {
|
|
baseURL := strings.TrimRight(cfg.BaseURL, "/")
|
|
if baseURL == "" {
|
|
baseURL = llamacppDefaultURL
|
|
}
|
|
model := cfg.Model
|
|
if model == "" {
|
|
model = "default" // llama.cpp server ignores the model field
|
|
}
|
|
apiURL := baseURL + "/v1"
|
|
begin := time.Now()
|
|
prov, err := openaicompat.NewLlamaCpp(provider.ProviderConfig{BaseURL: apiURL})
|
|
if err != nil {
|
|
return nil, fmt.Errorf("slm: llamacpp provider: %w", err)
|
|
}
|
|
logger.Info("slm: using llama.cpp backend", "base_url", baseURL, "model", model)
|
|
return &Boot{
|
|
Backend: BackendLlamaCpp,
|
|
Provider: prov,
|
|
Model: model,
|
|
BootTime: time.Since(begin),
|
|
ToolSupport: probeLlamacppToolSupport(baseURL),
|
|
Close: func() error { return nil },
|
|
}, nil
|
|
}
|
|
|
|
func startOpenAICompat(_ context.Context, cfg BackendConfig, logger *slog.Logger) (*Boot, error) {
|
|
baseURL := strings.TrimRight(cfg.BaseURL, "/")
|
|
if baseURL == "" {
|
|
return nil, fmt.Errorf("slm: openaicompat backend requires [slm] base_url")
|
|
}
|
|
model := cfg.Model
|
|
if model == "" {
|
|
return nil, fmt.Errorf("slm: openaicompat backend requires [slm] model")
|
|
}
|
|
begin := time.Now()
|
|
prov, err := openaicompat.NewLlamafile(provider.ProviderConfig{BaseURL: baseURL})
|
|
if err != nil {
|
|
return nil, fmt.Errorf("slm: openaicompat provider: %w", err)
|
|
}
|
|
logger.Info("slm: using openai-compatible backend", "base_url", baseURL, "model", model)
|
|
return &Boot{
|
|
Backend: BackendOpenAICompat,
|
|
Provider: prov,
|
|
Model: model,
|
|
BootTime: time.Since(begin),
|
|
ToolSupport: cfg.ToolSupport, // user-asserted; no generic probe
|
|
Close: func() error { return nil },
|
|
}, nil
|
|
}
|
|
|
|
func startLlamafile(ctx context.Context, cfg BackendConfig, logger *slog.Logger) (*Boot, error) {
|
|
dataDir := cfg.DataDir
|
|
if dataDir == "" {
|
|
dataDir = DefaultDataDir()
|
|
}
|
|
mgr := New(Config{DataDir: dataDir, ModelURL: cfg.ModelURL}, logger)
|
|
if !mgr.IsSetUp() {
|
|
return nil, fmt.Errorf("slm: llamafile not set up; run: gnoma slm setup")
|
|
}
|
|
|
|
timeout := cfg.StartupTimeout
|
|
if timeout <= 0 {
|
|
timeout = 5 * time.Second
|
|
}
|
|
bootCtx, cancel := context.WithTimeout(ctx, timeout)
|
|
defer cancel()
|
|
baseURL, err := mgr.Start(bootCtx)
|
|
if err != nil {
|
|
return nil, fmt.Errorf("slm: start llamafile: %w", err)
|
|
}
|
|
prov, err := openaicompat.NewLlamafile(provider.ProviderConfig{BaseURL: baseURL + "/v1"})
|
|
if err != nil {
|
|
_ = mgr.Stop()
|
|
return nil, fmt.Errorf("slm: llamafile provider: %w", err)
|
|
}
|
|
return &Boot{
|
|
Backend: BackendLlamafile,
|
|
Provider: prov,
|
|
Model: "default",
|
|
BootTime: mgr.StartupDuration(),
|
|
ToolSupport: probeLlamacppToolSupport(baseURL), // llamafile speaks the llama.cpp server protocol
|
|
Close: func() error { return mgr.Stop() },
|
|
}, nil
|
|
}
|
|
|
|
// autoStart picks the first available backend in priority order:
|
|
//
|
|
// 1. Explicit llamafile (ModelURL or DataDir is set, AND the manifest is on
|
|
// disk) — respects users who already ran `gnoma slm setup`.
|
|
// 2. Ollama, if reachable with at least one model.
|
|
// 3. llama.cpp, if reachable.
|
|
// 4. llamafile, if a manifest happens to exist anywhere.
|
|
// 5. Nothing → returns (nil, nil); caller stays on heuristic classifier.
|
|
func autoStart(ctx context.Context, cfg BackendConfig, logger *slog.Logger) (*Boot, error) {
|
|
// Hint: if the user has llamafile config set, prefer it.
|
|
if cfg.ModelURL != "" {
|
|
mgr := New(Config{
|
|
DataDir: defaultIfEmpty(cfg.DataDir, DefaultDataDir()),
|
|
ModelURL: cfg.ModelURL,
|
|
}, logger)
|
|
if mgr.IsSetUp() {
|
|
return startLlamafile(ctx, cfg, logger)
|
|
}
|
|
}
|
|
|
|
if model, ok := pickSmallestOllamaModel(ollamaDefaultURL); ok {
|
|
c := cfg
|
|
c.Model = model
|
|
boot, err := startOllama(ctx, c, logger)
|
|
if err == nil {
|
|
return boot, nil
|
|
}
|
|
logger.Debug("slm auto: ollama probe found models but provider init failed", "error", err)
|
|
}
|
|
|
|
if llamacppReachable(llamacppDefaultURL) {
|
|
boot, err := startLlamaCpp(ctx, cfg, logger)
|
|
if err == nil {
|
|
return boot, nil
|
|
}
|
|
}
|
|
|
|
mgr := New(Config{DataDir: DefaultDataDir(), ModelURL: cfg.ModelURL}, logger)
|
|
if mgr.IsSetUp() {
|
|
return startLlamafile(ctx, cfg, logger)
|
|
}
|
|
|
|
logger.Info("slm auto: no backend reachable; staying on heuristic classifier")
|
|
return nil, nil
|
|
}
|
|
|
|
// ---- Discovery helpers --------------------------------------------------
|
|
|
|
// pickSmallestOllamaModel returns the model with the smallest reported size
|
|
// from the Ollama /api/tags endpoint. Returns ("", false) when Ollama is not
|
|
// reachable or has no models.
|
|
func pickSmallestOllamaModel(baseURL string) (string, bool) {
|
|
client := &http.Client{Timeout: 1500 * time.Millisecond}
|
|
resp, err := client.Get(baseURL + "/api/tags")
|
|
if err != nil {
|
|
return "", false
|
|
}
|
|
defer func() { _ = resp.Body.Close() }()
|
|
if resp.StatusCode != http.StatusOK {
|
|
return "", false
|
|
}
|
|
var body struct {
|
|
Models []struct {
|
|
Name string `json:"name"`
|
|
Size int64 `json:"size"`
|
|
} `json:"models"`
|
|
}
|
|
if err := json.NewDecoder(resp.Body).Decode(&body); err != nil {
|
|
return "", false
|
|
}
|
|
if len(body.Models) == 0 {
|
|
return "", false
|
|
}
|
|
sort.Slice(body.Models, func(i, j int) bool {
|
|
return body.Models[i].Size < body.Models[j].Size
|
|
})
|
|
return body.Models[0].Name, true
|
|
}
|
|
|
|
func llamacppReachable(baseURL string) bool {
|
|
client := &http.Client{Timeout: 750 * time.Millisecond}
|
|
resp, err := client.Get(baseURL + "/props")
|
|
if err != nil {
|
|
return false
|
|
}
|
|
defer func() { _ = resp.Body.Close() }()
|
|
return resp.StatusCode == http.StatusOK
|
|
}
|
|
|
|
// probeOllamaToolSupport asks Ollama's /api/show whether a given model
|
|
// advertises the "tools" capability. Returns false on any error or when
|
|
// the capability is missing — conservative: assume no tools when unsure.
|
|
func probeOllamaToolSupport(baseURL, model string) bool {
|
|
body, err := json.Marshal(map[string]string{"model": model})
|
|
if err != nil {
|
|
return false
|
|
}
|
|
req, err := http.NewRequest(http.MethodPost, baseURL+"/api/show", bytes.NewReader(body))
|
|
if err != nil {
|
|
return false
|
|
}
|
|
req.Header.Set("Content-Type", "application/json")
|
|
client := &http.Client{Timeout: 2 * time.Second}
|
|
resp, err := client.Do(req)
|
|
if err != nil {
|
|
return false
|
|
}
|
|
defer func() { _ = resp.Body.Close() }()
|
|
if resp.StatusCode != http.StatusOK {
|
|
return false
|
|
}
|
|
var r struct {
|
|
Capabilities []string `json:"capabilities"`
|
|
}
|
|
if err := json.NewDecoder(resp.Body).Decode(&r); err != nil {
|
|
return false
|
|
}
|
|
for _, c := range r.Capabilities {
|
|
if c == "tools" {
|
|
return true
|
|
}
|
|
}
|
|
return false
|
|
}
|
|
|
|
// probeLlamacppToolSupport asks llama.cpp's /props endpoint whether the
|
|
// chat template advertises tool support. Same convention as Ollama: assume
|
|
// no tools when the probe fails.
|
|
func probeLlamacppToolSupport(baseURL string) bool {
|
|
client := &http.Client{Timeout: 1500 * time.Millisecond}
|
|
resp, err := client.Get(baseURL + "/props")
|
|
if err != nil {
|
|
return false
|
|
}
|
|
defer func() { _ = resp.Body.Close() }()
|
|
if resp.StatusCode != http.StatusOK {
|
|
return false
|
|
}
|
|
var r struct {
|
|
ChatTemplateCaps struct {
|
|
SupportsTools bool `json:"supports_tools"`
|
|
SupportsToolCalls bool `json:"supports_tool_calls"`
|
|
} `json:"chat_template_caps"`
|
|
}
|
|
if err := json.NewDecoder(resp.Body).Decode(&r); err != nil {
|
|
return false
|
|
}
|
|
return r.ChatTemplateCaps.SupportsTools && r.ChatTemplateCaps.SupportsToolCalls
|
|
}
|
|
|
|
func defaultIfEmpty(s, fallback string) string {
|
|
if s == "" {
|
|
return fallback
|
|
}
|
|
return s
|
|
}
|