Bundles the pending TUI work into a coherent batch. Bug fixes from
external review:
* expandPlaceholders: single-pass alternation regex over the original
input prevents `#p\d+` / `#img\d+` tokens inside pasted content from
being re-expanded after the bracket form is inlined.
* /incognito: gate savePromptHistory and the Ctrl+V image-write branch
on `!m.incognito` so the no-persistence contract holds.
* history.txt: write at mode 0600 (chmod existing 0644 files), create
parent dir at 0700, truncate to 500 entries on every save, slog.Warn
on errors instead of swallowing.
* triggerPickerAction: guard m.config.Engine before SetModel, matching
the /model handler.
* Picker key handler: navigation/enter/q consume, escape/ctrl+c close
the picker AND fall through to global handlers (so streaming cancel
and double-tap quit work with an overlay open), default swallows
stray input.
* Paste line count: report total non-empty lines instead of newline
count, ignoring trailing newlines (no more "+0 lines" for "abc").
* Ctrl+O restored to expand-output; Ctrl+Y is the new copy-response
bind. /keys help text updated; picker help entries reordered.
* Tighter perms on .gnoma/pasted_image_*.png (0600).
Race-safety refactor: ApplyTheme used to mutate ~25 package-level
lipgloss styles in place. Replaced with an immutable themeStyles
snapshot and atomic.Pointer[themeStyles] swap. Readers go through a
theme() helper (one atomic load) instead of touching package vars
directly. No locks, no nested-RLock risk if rendering ever moves
off-thread.
Includes pre-existing in-flight work: TUISection in config with
persistent theme/vim settings; /copy /theme /vim slash commands;
provider-name completion; session.SetProvider for the provider picker.
Tests: placeholder_test.go (6 regression + happy-path cases including
the pasted-content collision), history_test.go (5 cases covering perms
on new and existing files, on-disk truncation, blank-input, newline
flattening), provider_test.go (provider switching + picker transitions
+ SLM gating).
The SLM had two intended jobs — classify every prompt and execute the
small ones itself — but in practice three independent gates kept it
out of nearly all real work:
1. llamafile cold-start blocked pipe-mode runs (always faster than
the 15 s health check)
2. ClassifyTask defaulted RequiresTools=true, excluding the SLM arm
(ToolUse=false) from 9/10 task types
3. armTier hard-coded CLI agents > local > API, so even when the SLM
arm was feasible a CLI agent won
Each gate is addressed below. The result is an SLM that actually does
its job — small stuff stays local, complex stuff routes up — gated by
arm capability rather than by accidents of the boot order.
Backend layer (the bigger change)
The original implementation hard-coded llamafile. That's fine if you
have nothing else, but most users with a local model setup already run
Ollama or llama.cpp. The new factory at internal/slm/backend.go picks
between:
- ollama (any local Ollama daemon)
- llamacpp (any llama.cpp server)
- llamafile (gnoma-managed, current behaviour)
- openaicompat (LM Studio, vLLM, remote API)
- auto (probes in order, picks first reachable)
- disabled
[slm].backend in config.toml selects which. Documented in
docs/slm-backends.md with copy-paste presets for each. The factory
probes the underlying model's actual capabilities (Ollama /api/show,
llama.cpp /props) and sets the SLM arm's ToolUse accordingly — so the
arm picks up simple file-read style tasks on tool-capable models and
stays knowledge-only on completion-only models.
Trivial-prompt heuristic (Gate 2)
ClassifyTask now flips RequiresTools=false for short, low-complexity
prompts whose task type doesn't imply existing code (Explain,
Generation, Boilerplate). Tool-needing tokens (read, write, run, test,
file, …) keep RequiresTools=true even when the prompt is brief.
Complexity-aware tier ordering (Gate 3)
armTier takes a Task and returns tier 0 for arms whose MaxComplexity
ceiling fits the task. CLI agents drop to tier 1, local to 2, API to 3.
For trivial tasks the SLM arm wins; for complex tasks the SLM falls
out of the feasible set (MaxComplexity exclusion) and the original
ordering reasserts.
Eager boot with user-facing wait (Gate 1)
Removed the original goroutine-only path. SLM startup now blocks
synchronously inside the factory; for llamafile that means up to
[slm].startup_timeout (default 5 s) of waiting on the first
invocation, with "Starting SLM…" → "SLM ready (backend, model, tools,
boot=N)" / "SLM unavailable: …" messages on stderr. Ollama / llamacpp
backends boot instantly because the daemon is already running.
waitHealthy() now respects the caller's context deadline instead of
its old hardcoded 15 s ceiling.
Classifier reliability
Classifier timeout bumped 2 s → 5 s for thinking-mode models like
Qwen3-distilled Tiny3.5. System prompt includes /no_think directive
for the same family. These help but don't eliminate small-model
JSON-contract failures — see the docs section on picking a model.
Probe + telemetry surfaces
gnoma slm status now prints the configured backend + model + a live
probe result (✓/✗) instead of just the llamafile manifest state.
`gnoma router stats` already (from the previous commit) shows the
classifier-source mix; with this change you can finally see slm /
slm_fallback / heuristic share rise from "always heuristic" to
something reflecting real SLM activity.
Tests
- 9 new backend-factory tests (httptest-backed Ollama probe, error
paths, auto-detection, capability flags)
- Tier-ordering tests cover the new "specialised small arm wins
trivial task" path
- Trivial-prompt heuristic tested for both halves (knowledge-only
flips RequiresTools=false; debug/file/run keeps it true)
Deletes the dead SLMManager field from the TUI Config — it was
declared but never read.
Remove the hardcoded mistral default so gnoma starts without any
provider configured. TUI mode uses a stubProvider that lets CLI agent
arms (claude, gemini, etc.) handle routing; pipe mode prints a clear
setup message.
Also: gnoma slm setup now auto-writes the default model_url to the
global config when none is set, instead of erroring.
- ctrl+o toggles between 10-line truncated and full tool output
- Label shows "ctrl+o to expand" (lowercase)
- Fixed: auto permission mode now sticks — config default was
overriding flag default ("default" → "auto" in config defaults)