3 Commits

Author SHA1 Message Date
vikingowl aca830e7db feat(engine): consumption-time stream-error failover
When a stream errors out before producing any user-visible content
(text, thinking, or tool calls), the engine now transparently retries
on the next-best arm instead of bubbling the error to the TUI. Covers
the case from the post-SLM screenshot: subprocess CLI agents that
exit non-zero on auth/config failures, network drops mid-stream,
rate-limited arms whose error surfaces after Stream() already returned.

Mechanism: the stream-create + consume blocks are wrapped in a labeled
streamLoop. On s.Err() != nil with empty accumulator, the engine emits
a new EventFailover ("↻ <failed_arm> failed (<reason>) — retrying on
another arm"), excludes the failed arm via task.ExcludedArms, and
re-enters the loop. Cap of 4 failovers per round.

Guards:
- !acc.HasContent() — if text/tool calls already streamed, fail loud
  rather than duplicate visible output on retry.
- isFailoverable(err) — deny-list approach: context.Canceled/Deadline
  and HTTP 400/413 are fatal; everything else (auth, rate limit, 5xx,
  subprocess exit, network) is failoverable.
- Router.ForcedArm() == "" — when the user pinned an arm via --provider,
  failover is disabled by design.
- failoverAttempt < maxFailovers — bounded retry budget.

TUI renders EventFailover under the existing "cost" role styling.
shortFailReason strips the subprocess wrapper envelope so the user sees
"Invalid API key. Try again." instead of
"subprocess: exit status 1: Error: Invalid API key. Try again.".

Tests cover the classifier (isFailoverable, shortFailReason), end-to-end
auth-error failover, content-already-streamed guard, and context-cancel
guard. Deterministic across 10x -race runs by giving the failing arm
IsCLIAgent=true to anchor it in tier 0 ahead of the API-tier backup.
2026-05-20 02:20:00 +02:00
vikingowl 0b4de6054d feat(tui): surface SLM backend + per-turn classifier in status bar
The TUI gave no indication that an SLM was configured or active.
You'd see the primary provider on the status line and nothing else,
even with [slm].enabled=true and a successfully booted backend.

Two surfaces added:

1. Status-bar SLM badge. The left side of the status line gains a
   dim " · slm: <model> ⚙" suffix when the backend booted, " · slm: ✗"
   when it failed, and nothing when SLM is disabled. The ⚙ marker
   indicates the model advertises tool support.

2. Per-turn classifier visibility. The existing routing event already
   produced "routed → <arm> (task: <type>)" lines in the chat history;
   it now also reports which classifier made the decision, e.g.
   "routed → ollama/ministral-3:3b (task: explain, by: slm_fallback)".
   Lets you tell in real time whether the SLM is actually classifying
   or falling back to the keyword heuristic.

Plumbing:
  - new tui.SLMInfo struct on tui.Config
  - main.go populates it after StartBackend returns
  - stream.Event gains RoutingClassifier; engine.runLoop fills it from
    task.ClassifierSource on the first round
2026-05-19 19:06:26 +02:00
vikingowl ce5f9d3dc9 feat(tui): Tier 3-4 UX improvements — split, routing, session naming, context bar
- Split app.go (2091→1378 lines) into rendering.go, events.go, init.go
- Add EventRouting stream event for router arm transparency
- Add session auto-naming from first user message
- Add context window progress bar in status bar
- Add /keys cheatsheet, /replay for resumed sessions
- Add inline cost-per-turn after assistant responses
- Add diff previews in fs.write/fs.edit permission prompts
- Collapse tool output to 3 lines by default (ctrl+o expands)
- Use AddPrefix for system context instead of InjectMessage
- Handle ContentThinking and ContentToolResult in session resume
- Show session title in resume picker
- Add /model numeric selection snapshot safety
2026-04-12 05:13:16 +02:00