58 Commits

Author SHA1 Message Date
vikingowl 0d3d190a8b fix(slm,session,router): classifier-only SLMs + session error recovery + feasibility diagnostics
Three coupled fixes that surfaced from a single FunctionGemma test
session where the SLM-as-execution-arm assumption broke down and
every subsequent prompt failed with 'session not idle (state: error)'.

(A) [slm].register_as_arm config. The SLM has always been
unconditionally registered as both classifier AND tier-0 execution
arm. Fine for general-purpose models (ministral, qwen3-chat); breaks
for task-specialised models (FunctionGemma emits function-call
syntax instead of prose; embedding models can't generate). New
pointer-bool config: nil/absent preserves the historical default
(true), explicit false makes the SLM classifier-only and the
execution path skips the slm/* arm. Three table tests cover absent
/ explicit-false / explicit-true decode paths.

(B) Session error recovery. After any routing or engine error, the
session moved to StateError and stayed there until restart — every
new user prompt got rejected with 'session not idle (state: error)'.
ResetError() was already wired for the /init retry path, but the
general user-input and slash-command paths didn't call it. Added
ResetError() before every user-initiated Send in the TUI so a fresh
prompt always represents intent-to-retry. The /init internal retry
already had its own ResetError; left alone.

(C) filterFeasible per-arm rejection logging. Today's 'no feasible
arm for task X' error tells you THAT every arm was rejected but
nothing about WHY. Added slog.Debug per rejection (arm, task,
complexity, reason, the specific violated constraint) plus a
summary line when zero arms are feasible at any quality. Visible
with --verbose; quiet otherwise. Surface area expansion only — no
behaviour change for users not chasing a bug.
2026-05-25 01:57:16 +02:00
vikingowl 6c5e969217 feat(tui): add /router command for runtime routing-preference switch
Mirrors the pattern of /permission: bare command shows the current
value plus a help line; with an argument (auto/local/cloud) it calls
Router.SetPreferPolicy and emits a system message. Session-only — does
not write back to config.toml, matching /permission and Ctrl+X
incognito-toggle conventions.

Tab completion on the value via routerPreferModes alongside the
existing permissionModes pattern. Help text updated. Status-bar
indicator deferred (separate concern if it turns out to be wanted).
2026-05-24 22:13:27 +02:00
vikingowl 74bd570438 fix(tui): de-dupe /init in command picker; skill names shadow builtins
/init appeared twice in the completion picker — once from the static
builtinCommands list and once from the bundled init skill at
internal/skill/skills/init.md (registered via skills.All()).

Two changes:

- Remove /init from builtinCommands. The skill provides the canonical
  entry, and its description ('Generate or update AGENTS.md project
  documentation') is more accurate than the static one ('initialize
  project — create AGENTS.md') because the skill handles both create
  and update.
- Refactor completionSource() so a skill name silently shadows any
  builtin with the same name. Prevents this from recurring if a
  future builtin migrates to a skill, and lets users override a
  builtin's description by dropping a skill of the same name into
  .gnoma/skills/.
2026-05-24 22:08:46 +02:00
vikingowl bd41d76e32 refactor(tui): store pasted images in user cache, not project workdir
Ctrl+V image paste used to write the file to .gnoma/pasted_image_*.png
under the project root, which polluted the workdir and risked
committing screenshots that may contain sensitive content.

Now writes to os.UserCacheDir() / gnoma / pasted-images/ (XDG cache
on Linux, ~/Library/Caches on macOS, %LocalAppData% on Windows).
The directory is created at 0700 and files at 0600 since pasted
content can be sensitive.

Each paste prunes entries older than 2 hours best-effort, so the
cache doesn't accumulate across sessions. The 2h window safely
covers any single turn including provider retries and slow
subprocess CLIs that need the file to still exist on disk when
they ingest the path.

.gitignore: cover the legacy `.gnoma/pasted_image_*` location for
old checkouts; add log.txt and codex_out.jsonl which were tracked
as runtime artifacts during the recent work.

Tests cover cache-path placement, restrictive perms on both the
directory and the file, the no-pollution-of-cwd invariant, and the
prune behavior (stale removed, fresh kept, missing dir no-op).
2026-05-22 11:56:04 +02:00
vikingowl e38cce5f1f fix(tui): security hardening, race-safety, and event handling fixes
Bundles the pending TUI work into a coherent batch. Bug fixes from
external review:

* expandPlaceholders: single-pass alternation regex over the original
  input prevents `#p\d+` / `#img\d+` tokens inside pasted content from
  being re-expanded after the bracket form is inlined.
* /incognito: gate savePromptHistory and the Ctrl+V image-write branch
  on `!m.incognito` so the no-persistence contract holds.
* history.txt: write at mode 0600 (chmod existing 0644 files), create
  parent dir at 0700, truncate to 500 entries on every save, slog.Warn
  on errors instead of swallowing.
* triggerPickerAction: guard m.config.Engine before SetModel, matching
  the /model handler.
* Picker key handler: navigation/enter/q consume, escape/ctrl+c close
  the picker AND fall through to global handlers (so streaming cancel
  and double-tap quit work with an overlay open), default swallows
  stray input.
* Paste line count: report total non-empty lines instead of newline
  count, ignoring trailing newlines (no more "+0 lines" for "abc").
* Ctrl+O restored to expand-output; Ctrl+Y is the new copy-response
  bind. /keys help text updated; picker help entries reordered.
* Tighter perms on .gnoma/pasted_image_*.png (0600).

Race-safety refactor: ApplyTheme used to mutate ~25 package-level
lipgloss styles in place. Replaced with an immutable themeStyles
snapshot and atomic.Pointer[themeStyles] swap. Readers go through a
theme() helper (one atomic load) instead of touching package vars
directly. No locks, no nested-RLock risk if rendering ever moves
off-thread.

Includes pre-existing in-flight work: TUISection in config with
persistent theme/vim settings; /copy /theme /vim slash commands;
provider-name completion; session.SetProvider for the provider picker.

Tests: placeholder_test.go (6 regression + happy-path cases including
the pasted-content collision), history_test.go (5 cases covering perms
on new and existing files, on-disk truncation, blank-input, newline
flattening), provider_test.go (provider switching + picker transitions
+ SLM gating).
2026-05-22 11:50:12 +02:00
vikingowl c4fde583f5 chore(lint): gofmt sweep + errcheck cleanups in router discovery
Apply gofmt -w across the codebase (struct field comment realignment
only — no semantic changes) and silence two errcheck warnings on
fmt.Sscanf / fmt.Fprintf return values in internal/router/discovery
with explicit `_, _ =` discards. Required so `make check` is green
before tagging v0.1.0.
2026-05-20 03:13:05 +02:00
vikingowl aca830e7db feat(engine): consumption-time stream-error failover
When a stream errors out before producing any user-visible content
(text, thinking, or tool calls), the engine now transparently retries
on the next-best arm instead of bubbling the error to the TUI. Covers
the case from the post-SLM screenshot: subprocess CLI agents that
exit non-zero on auth/config failures, network drops mid-stream,
rate-limited arms whose error surfaces after Stream() already returned.

Mechanism: the stream-create + consume blocks are wrapped in a labeled
streamLoop. On s.Err() != nil with empty accumulator, the engine emits
a new EventFailover ("↻ <failed_arm> failed (<reason>) — retrying on
another arm"), excludes the failed arm via task.ExcludedArms, and
re-enters the loop. Cap of 4 failovers per round.

Guards:
- !acc.HasContent() — if text/tool calls already streamed, fail loud
  rather than duplicate visible output on retry.
- isFailoverable(err) — deny-list approach: context.Canceled/Deadline
  and HTTP 400/413 are fatal; everything else (auth, rate limit, 5xx,
  subprocess exit, network) is failoverable.
- Router.ForcedArm() == "" — when the user pinned an arm via --provider,
  failover is disabled by design.
- failoverAttempt < maxFailovers — bounded retry budget.

TUI renders EventFailover under the existing "cost" role styling.
shortFailReason strips the subprocess wrapper envelope so the user sees
"Invalid API key. Try again." instead of
"subprocess: exit status 1: Error: Invalid API key. Try again.".

Tests cover the classifier (isFailoverable, shortFailReason), end-to-end
auth-error failover, content-already-streamed guard, and context-cancel
guard. Deterministic across 10x -race runs by giving the failing arm
IsCLIAgent=true to anchor it in tier 0 ahead of the API-tier backup.
2026-05-20 02:20:00 +02:00
vikingowl 34f6f1c786 feat(security): incognito coherence across firewall/router/persist (Wave 2)
Closes the cluster of audit findings where gnoma's incognito promise
('no persistence, no learning, local-only routing') silently broke
because state was duplicated across the CLI flag, the firewall's
IncognitoMode, the router's localOnly flag, and the TUI's local
m.incognito field. Wave 2 makes security.IncognitoMode the canonical
source of truth.

W2-1 Router.Select rejects forced non-local arms when localOnly is on
  rather than short-circuiting and silently routing to cloud. Main
  fails fast when --incognito + --provider <cloud> are combined; the
  TUI toggle (Ctrl+X, /incognito, config panel) refuses with an
  actionable message when a non-local arm is pinned. Factored the
  three duplicated toggle sites into Model.attemptIncognitoToggle.

W2-2 persist.Store.Save consults an IncognitoGate (local interface,
  *security.IncognitoMode satisfies it). nil gate = always persist
  (legacy behaviour for tests); non-nil gate is consulted on every
  Save so TUI runtime toggles take effect without reconstructing the
  store. File mode 0o600, dir mode 0o700.

W2-3 tui.New seeds m.incognito from cfg.Firewall.Incognito().Active().
  Fixes the Ctrl+X-on-launch-with-incognito case where the first
  toggle silently turned the firewall OFF because the local flag
  started false out of sync with the firewall.

W2-4 saveQuality gates on both *incognito (defensive, covers the
  window before fwRef.Set fires) and fw.Incognito().ShouldLearn() (so
  TUI Ctrl+X suppresses the snapshot on exit). Quality restore skipped
  under --incognito. Quality file written 0o600 in dir 0o700.
  engine.reportOutcome and elf.Manager.ReportResult both gate on
  fw.Incognito().ShouldLearn() — bandit signal no longer leaks out of
  incognito sessions.

W2-5 session files written 0o600 in dirs 0o700 (was 0o644 / 0o755).

W2-6 IncognitoMode.LocalOnly dropped — dead field with no readers;
  routing local-only state lives on the router, not the firewall.

Also wires rtr.SetLocalOnly(true) when --incognito at launch — main
previously activated the firewall's flag but never told the router to
filter, so even without the forced-arm bug, launching with
--incognito alone gave you 'incognito badge but full arm pool'.
2026-05-19 22:57:36 +02:00
vikingowl d84b295da2 feat(tui): /profile slash command + status-bar profile badge (Phase C-3)
Adds the in-TUI surface for the profile system:

- Status bar carries " · profile: <name>" next to the SLM badge when
  profile mode is engaged (renders nothing in legacy single-config
  installations).
- /profile (no args) shows the active profile and lists available ones.
- /profile <name> switches by re-executing gnoma via syscall.Exec under
  --profile <name>. Critical cleanups (quality.json snapshot, SLM
  backend Close, session.Close) fire explicitly before exec since
  defers don't run after exec replaces the process image. Using
  syscall.Exec rather than a child process avoids stacking a process
  level on every switch and propagates the new gnoma's exit code
  directly to the shell.
- Autocomplete after "/profile " offers configured profile names; the
  completion source is threaded from main.go via tui.Config.

Conversation history is not preserved across a switch — profile change
implies different context, different keys, different permission mode,
so a clean reset is the correct semantic.
2026-05-19 21:59:11 +02:00
vikingowl 0b4de6054d feat(tui): surface SLM backend + per-turn classifier in status bar
The TUI gave no indication that an SLM was configured or active.
You'd see the primary provider on the status line and nothing else,
even with [slm].enabled=true and a successfully booted backend.

Two surfaces added:

1. Status-bar SLM badge. The left side of the status line gains a
   dim " · slm: <model> ⚙" suffix when the backend booted, " · slm: ✗"
   when it failed, and nothing when SLM is disabled. The ⚙ marker
   indicates the model advertises tool support.

2. Per-turn classifier visibility. The existing routing event already
   produced "routed → <arm> (task: <type>)" lines in the chat history;
   it now also reports which classifier made the decision, e.g.
   "routed → ollama/ministral-3:3b (task: explain, by: slm_fallback)".
   Lets you tell in real time whether the SLM is actually classifying
   or falling back to the keyword heuristic.

Plumbing:
  - new tui.SLMInfo struct on tui.Config
  - main.go populates it after StartBackend returns
  - stream.Event gains RoutingClassifier; engine.runLoop fills it from
    task.ClassifierSource on the first round
2026-05-19 19:06:26 +02:00
vikingowl a14fe8b504 feat(slm): pluggable backends + trivial-prompt routing
The SLM had two intended jobs — classify every prompt and execute the
small ones itself — but in practice three independent gates kept it
out of nearly all real work:

  1. llamafile cold-start blocked pipe-mode runs (always faster than
     the 15 s health check)
  2. ClassifyTask defaulted RequiresTools=true, excluding the SLM arm
     (ToolUse=false) from 9/10 task types
  3. armTier hard-coded CLI agents > local > API, so even when the SLM
     arm was feasible a CLI agent won

Each gate is addressed below. The result is an SLM that actually does
its job — small stuff stays local, complex stuff routes up — gated by
arm capability rather than by accidents of the boot order.

Backend layer (the bigger change)

The original implementation hard-coded llamafile. That's fine if you
have nothing else, but most users with a local model setup already run
Ollama or llama.cpp. The new factory at internal/slm/backend.go picks
between:

  - ollama (any local Ollama daemon)
  - llamacpp (any llama.cpp server)
  - llamafile (gnoma-managed, current behaviour)
  - openaicompat (LM Studio, vLLM, remote API)
  - auto (probes in order, picks first reachable)
  - disabled

[slm].backend in config.toml selects which. Documented in
docs/slm-backends.md with copy-paste presets for each. The factory
probes the underlying model's actual capabilities (Ollama /api/show,
llama.cpp /props) and sets the SLM arm's ToolUse accordingly — so the
arm picks up simple file-read style tasks on tool-capable models and
stays knowledge-only on completion-only models.

Trivial-prompt heuristic (Gate 2)

ClassifyTask now flips RequiresTools=false for short, low-complexity
prompts whose task type doesn't imply existing code (Explain,
Generation, Boilerplate). Tool-needing tokens (read, write, run, test,
file, …) keep RequiresTools=true even when the prompt is brief.

Complexity-aware tier ordering (Gate 3)

armTier takes a Task and returns tier 0 for arms whose MaxComplexity
ceiling fits the task. CLI agents drop to tier 1, local to 2, API to 3.
For trivial tasks the SLM arm wins; for complex tasks the SLM falls
out of the feasible set (MaxComplexity exclusion) and the original
ordering reasserts.

Eager boot with user-facing wait (Gate 1)

Removed the original goroutine-only path. SLM startup now blocks
synchronously inside the factory; for llamafile that means up to
[slm].startup_timeout (default 5 s) of waiting on the first
invocation, with "Starting SLM…" → "SLM ready (backend, model, tools,
boot=N)" / "SLM unavailable: …" messages on stderr. Ollama / llamacpp
backends boot instantly because the daemon is already running.

waitHealthy() now respects the caller's context deadline instead of
its old hardcoded 15 s ceiling.

Classifier reliability

Classifier timeout bumped 2 s → 5 s for thinking-mode models like
Qwen3-distilled Tiny3.5. System prompt includes /no_think directive
for the same family. These help but don't eliminate small-model
JSON-contract failures — see the docs section on picking a model.

Probe + telemetry surfaces

gnoma slm status now prints the configured backend + model + a live
probe result (✓/✗) instead of just the llamafile manifest state.

`gnoma router stats` already (from the previous commit) shows the
classifier-source mix; with this change you can finally see slm /
slm_fallback / heuristic share rise from "always heuristic" to
something reflecting real SLM activity.

Tests

  - 9 new backend-factory tests (httptest-backed Ollama probe, error
    paths, auto-detection, capability flags)
  - Tier-ordering tests cover the new "specialised small arm wins
    trivial task" path
  - Trivial-prompt heuristic tested for both halves (knowledge-only
    flips RequiresTools=false; debug/file/run keeps it true)

Deletes the dead SLMManager field from the TUI Config — it was
declared but never read.
2026-05-19 18:53:32 +02:00
vikingowl ec9433d783 chore(lint): clear remaining errcheck and staticcheck findings
Brings the project to a clean `make lint` baseline (0 issues).

Mechanical:
- Wrap deferred resp.Body.Close() in closures (router/discovery.go,
  router/probe.go) so the unchecked return surfaces as `_ = ...`.
- Apply `_ = ...` (single or multi-return blank) to test-file calls
  that intentionally ignore errors: os.MkdirAll / os.WriteFile / os.Chdir
  in setup paths, Close / Shutdown in teardown, Submit / Spawn / Send /
  LoadDir in tests that assert on side effects.

Structural:
- engine.handleRequestTooLarge drops the unused req parameter and
  rebuilds the request from compacted history (SA4009 — argument was
  overwritten before first use).
- provider.ClassifyHTTPStatus and google.applyCapabilityOverrides switch
  to tagged switches over the discriminator (QF1002).
- tui.app.go MouseWheel + inputMode and cmd/gnoma main slm-status use
  tagged switches in place of equality chains (QF1003).
- cmd/gnoma main.go merges a var decl with its immediate assignment
  (S1021).
- Three empty-branch sites (dispatcher_test, loader_test,
  coordinator_test) become real assertions or get the dead `if` removed
  (SA9003).
2026-05-19 17:53:42 +02:00
vikingowl 13b2f5e14d chore(lint): clear dead code and tighten lifecycle errcheck
Removes five unused funcs/vars/fields that golangci-lint had been
flagging (anthropic.toolCallDoneEvent, mistral.translateMessages,
hook.newError, subprocess.vibeParser.lastAssistantMsgID, tui.cBase),
two ineffectual assignments (tui/rendering.go visible-window loop,
subprocess stream_test setup), and a stale if/HasPrefix that's now a
strings.TrimPrefix.

Wires errcheck onto every subprocess / stream lifecycle path so a
failed close or shutdown is at least logged rather than silently
dropped:

- engine/loop.go: stream.Close on both the error and success paths
- mcp/manager.go: Shutdown when StartAll partial-fails; Transport
  close after Initialize failure
- mcp/transport.go: stdin.Close + syscall.Kill on graceful-timeout
  fallback
- slm/download.go: Close propagated as a named-return error on the
  success path; explicitly discarded on the rollback path
- slm/classifier.go, slm/manager.go, hook/prompt.go, context/summarize.go,
  config/write.go, cmd/gnoma/main.go, tool/fs/grep.go: explicit
  ignores or error logging on Close / Shutdown / WalkDir / Scanln

Production-code errcheck and ineffassign are now zero. Remaining
golangci-lint output is test-only Close-in-defer noise plus
stylistic staticcheck QF suggestions, left alone.
2026-05-19 17:05:54 +02:00
vikingowl 135c8afe80 feat: various improvements to engine, router, and TUI
- engine/loop: enhanced loop handling
- router: dynamic model discovery and task improvements
- tui: suggestion box, input mode indicator, completions enhancements
2026-05-07 22:51:50 +02:00
vikingowl befcbdcfef feat(tui): suggestion box above input, input mode indicator, ! execute
- Suggestion dropdown now renders between separator and input (not in
  chat area) — no more box at the top of an empty chat
- Ghost text suppressed when dropdown is visible (eliminates the
  'fig' / trailing text on the right)
- Bottom separator shows purple 'cmd' label when typing '/' and
  yellow 'exec' label when typing '!'
- '! <cmd>' prefix executes a raw shell command inline and shows
  output in the chat (same as /shell but one-shot)
2026-05-07 17:35:45 +02:00
vikingowl d2139c6f0c perf+feat: parallel startup discovery + slash-command suggestion dropdown
Startup: HarvestAliases, HarvestInventory, DiscoverCLIAgents, and
DiscoverLocalModels now run concurrently. Worst case latency drops
from sum(all) to max(all) — eliminates the 15s inventory timeout
from blocking the main path.

TUI: typing '/co' now shows a bordered dropdown of all matching
commands with descriptions. ↑↓ navigate, Tab/Enter accepts the
highlighted entry, Esc dismisses. Ghost-text still works for
unique unambiguous matches.
2026-05-07 17:30:16 +02:00
vikingowl f8867f5d78 feat(tui): /config opens interactive settings panel
Replaces the text dump with a navigable bordered overlay.
↑↓ to move, Enter to cycle/toggle values, Esc to close.
Shows: Model (cycles through discovered arms), Permission mode,
Incognito toggle.
2026-05-07 17:23:43 +02:00
vikingowl a9213ec382 feat(slm): Wave C — SLM classifier, MaxComplexity routing, CLI subcommands, TUI status
- slm.Classifier: openaicompat → llamafile, 2s timeout + heuristic fallback,
  heuristic baseline blended so Priority/RequiredEffort are never zeroed,
  extractJSON strips markdown fences from small-model responses
- router.ParseTaskType: case-insensitive string → TaskType, unknown → TaskGeneration
- router.Arm.MaxComplexity: zero = no ceiling (preserves existing arm behavior);
  filterFeasible excludes arms when task.ComplexityScore > MaxComplexity
- config.SLMSection: [slm] enabled / model_url / data_dir
- openaicompat.NewLlamafile: no API key, model = "default", no retries
- slm.Manager: DefaultDataDir() (XDG), Manifest() accessor
- cmd/gnoma: `gnoma slm setup` / `gnoma slm status` subcommands; SLM arm
  registered with MaxComplexity=0.3 when enabled + set up
- tui: /config shows slm status (ready/missing/not set up + base URL if running)
- docs: roadmap updated to reflect llamafile pivot from Ollama
2026-05-07 16:44:32 +02:00
vikingowl 0b1392cf6b feat(pty): Phase 2 — interactive shell and bash interactive detection
- /shell [cmd]: launch user's $SHELL via tea.ExecProcess (PTY handoff)
  hands terminal to the shell and restores TUI on exit.
  /shell <cmd> runs that command in the shell directly.
  Detects $SHELL > $COMSPEC > /bin/sh|powershell.exe in order.

- bash tool: detect interactive commands before execution
  Prefix-interactive: sudo, ssh, passwd, vim/vi/nano, less/more,
  htop/top, mysql/psql, ftp/sftp, git push.
  Exact-interactive (REPL): python3/python/node/irb/iex/ghci/julia.
  Returns a tool result with interactive=true metadata and a hint to
  use /shell instead of hanging or erroring.

- completions: add /shell to builtin command list
- help: document /shell [cmd]
2026-05-07 15:52:56 +02:00
vikingowl 176926924c feat(engine): M8 cleanup — Wave B skill enforcement
- Add tool.PathSensitiveTool interface (ExtractPaths); implement on all 6 fs tools
- Add engine.TurnOptions.AllowedPaths: restricts tool filesystem access per skill invocation
- Bash is denied outright when AllowedPaths is active (unparseable command args)
- fs tools with empty path (cwd default) resolved via os.Getwd() and validated
- Add engine.TurnOptions.AllowedTools + AllowedPaths wiring in pipe mode (main.go) and TUI skill dispatch (tui/app.go)
- Remove TODO(M8.3) from skill.Frontmatter — enforcement is now complete
2026-05-07 15:29:33 +02:00
vikingowl 7fbb5454ee feat(router): normalize effort/thinking abstraction across providers
Add EffortLevel (auto/low/medium/high) as a provider-agnostic reasoning
control, replacing the Capabilities.Thinking bool. Each provider maps
the level to its native parameter: Anthropic budget tokens (1K/8K/16K),
OpenAI reasoning_effort (low/medium/high), Google thinking budget
(1K/8K/16K). Task classification auto-infers effort from TaskType and
complexity; filterFeasible excludes arms that lack the required level.
2026-05-07 14:08:50 +02:00
vikingowl d71bd942c4 feat: local model reliability — SDK retries, capability probing, init skill, context compaction
Three compounding bugs prevented tool calling with llama.cpp:
- Stream parser set argsComplete on partial JSON (e.g. "{"), dropping
  subsequent argument deltas — fix: use json.Valid to detect completeness
- Missing tool_choice default — llama.cpp needs explicit "auto" to
  activate its GBNF grammar constraint; now set when tools are present
- Tool names in history used internal format (fs.ls) while definitions
  used API format (fs_ls) — now re-sanitized in translateMessage

Additional changes:
- Disable SDK retries for local providers (500s are deterministic)
- Dynamic capability probing via /props (llama.cpp) and /api/show
  (Ollama), replacing hardcoded model prefix list
- Engine respects forced arm ToolUse capability when router is active
- Bundled /init skill with Go template blocks, context-aware for local
  vs cloud models, deduplication rules against CLAUDE.md
- Tool result compaction for local models — previous round results
  replaced with size markers to stay within small context windows
- Text-only fallback when tool-parse errors occur on local models
- "text-only" TUI indicator when model lacks tool support
- Session ResetError for retry after stream failures
- AllowedTools per-turn filtering in engine buildRequest
2026-04-13 02:01:01 +02:00
vikingowl 2093beea58 fix: deterministic 500 retry, OpenAI error wrapping, local /init prompt
Stop retrying llama.cpp 500s that are deterministic tool-parse failures
by inspecting the error message body (ClassifyHTTPError). Wrap OpenAI SDK
errors as ProviderError so the engine's retry logic classifies them. Add
localInitPrompt for local models that uses sequential fs_* calls instead
of spawn_elfs (which local models can't produce reliably).
2026-04-12 18:35:18 +02:00
vikingowl 0caab0fed1 fix(router): discovery loop removes forced arm, breaking routing
The discovery loop's reconcileArms removed the CLI-forced arm
(llamacpp/default) because the llama.cpp server reports the real model
name (e.g. gemma-26b), creating a mismatch. After 30s the forced arm
disappeared and all subsequent requests failed.

Three-layer fix:
- Eager: query the specific provider at startup to resolve the real
  model name before registering the forced arm
- Lazy: reconcileArms detects placeholder "default" arm names and
  atomically renames them when discovery reveals the real identity,
  with an onReconcile callback to update the session and TUI
- Guard: the forced arm is never garbage-collected by the removal loop

Also fixes misleading /init error messaging — failed inits now show
"loaded from disk (init failed)" instead of "AGENTS.md written to".
2026-04-12 17:51:30 +02:00
vikingowl ce5f9d3dc9 feat(tui): Tier 3-4 UX improvements — split, routing, session naming, context bar
- Split app.go (2091→1378 lines) into rendering.go, events.go, init.go
- Add EventRouting stream event for router arm transparency
- Add session auto-naming from first user message
- Add context window progress bar in status bar
- Add /keys cheatsheet, /replay for resumed sessions
- Add inline cost-per-turn after assistant responses
- Add diff previews in fs.write/fs.edit permission prompts
- Collapse tool output to 3 lines by default (ctrl+o expands)
- Use AddPrefix for system context instead of InjectMessage
- Handle ContentThinking and ContentToolResult in session resume
- Show session title in resume picker
- Add /model numeric selection snapshot safety
2026-04-12 05:13:16 +02:00
vikingowl 48e63a9bc0 feat(tui): Tier 1-2 UX improvements — completions, usage, provider status
Tier 1 (launch blockers):
- Remove /shell from /help (advertised but unimplemented)
- Kill dead _ = closeLen assignment
- Cache glamour renderer by width — no longer recreated on every
  WindowSizeMsg when width hasn't changed

Tier 2 (ship-quality UX):
- Slash command ghost-text completion with Tab accept. Sources: static
  command list + dynamic skill names. /permission gets arg completion
  for the 6 modes.
- /compact reports before/after token counts (e.g. "32k → 18k tokens")
- /provider shows all registered arms grouped by provider, not just
  "restart required"
- /usage command: input/output/total tokens, context %, provider, turns
- Widen Ctrl+C quit window from 1s to 2s
- "new content below" indicator when scrolled up during streaming
- Permission prompt: inline chat notification when approval needed,
  so the user notices even if focused on input
2026-04-12 04:19:55 +02:00
vikingowl e04cacc215 fix: append mutation, pipe-mode hang, Mistral regex false positives
- Fix append footgun: allHooks/allMCPServers allocated fresh to avoid
  mutating cfg's backing array (lines 391/413 in main.go)
- Fix pipe-mode permission prompt: detect no-TTY stdin and auto-deny
  instead of blocking forever on fmt.Scanln EOF
- Tighten Mistral API key regex from bare [a-zA-Z0-9]{32} (matched
  commit hashes, UUIDs) to context-gated pattern requiring "mistral"
  keyword nearby. Added scanner test for positives and negatives.
- Remove README demo GIF TODO placeholder
- Unify version string: pass buildVersion from ldflags into tui.Config
  instead of hardcoding "v0.1.0-dev"
- Populate benchmarks doc with actual Go benchmark results
2026-04-12 03:49:47 +02:00
vikingowl 6c47f8643b feat(m8): MCP client, tool replaceability, and plugin system
Complete the remaining M8 extensibility deliverables:

- MCP client with JSON-RPC 2.0 over stdio transport, protocol
  lifecycle (initialize/tools-list/tools-call), and process group
  management for clean shutdown
- MCP tool adapter implementing tool.Tool with mcp__{server}__{tool}
  naming convention and replace_default for swapping built-in tools
- MCP manager for multi-server orchestration with parallel startup,
  tool discovery, and registry integration
- Plugin system with plugin.json manifest (name/version/capabilities),
  directory-based discovery (global + project scopes with precedence),
  loader that merges skills/hooks/MCP configs into existing registries,
  and install/uninstall/list lifecycle manager
- Config additions: MCPServerConfig, PluginsSection with opt-in/opt-out
  enabled/disabled resolution
- TUI /plugins command for listing installed plugins
- 54 tests across internal/mcp and internal/plugin packages
2026-04-12 03:09:05 +02:00
vikingowl 893880039b feat(skill): TUI integration — /skillname invokes skills, /skills lists them 2026-04-07 02:18:12 +02:00
vikingowl 12ace89e31 feat: interactive session picker for /resume and --resume 2026-04-06 00:22:52 +02:00
vikingowl 167db19bfb feat: /resume TUI command + SessionStore in tui.Config
- Add SessionStore field to tui.Config
- Add /resume slash command: lists sessions or restores by ID
- Pass SessionStore to tui.New in main.go
- Update /help text to include /resume
- Add .gnoma/sessions/ to .gitignore
2026-04-05 23:51:48 +02:00
vikingowl 4f1e0cf567 feat: Ollama/gemma4 compat — /init flow, stream filter, safety fixes
provider/openai:
- Fix doubled tool call args (argsComplete flag): Ollama sends complete
  args in the first streaming chunk then repeats them as delta, causing
  doubled JSON and 400 errors in elfs
- Handle fs: prefix (gemma4 uses fs:grep instead of fs.grep)
- Add Reasoning field support for Ollama thinking output

cmd/gnoma:
- Early TTY detection so logger is created with correct destination
  before any component gets a reference to it (fixes slog WARN bleed
  into TUI textarea)

permission:
- Exempt spawn_elfs and agent tools from safety scanner: elf prompt
  text may legitimately mention .env/.ssh/credentials patterns and
  should not be blocked

tui/app:
- /init retry chain: no-tool-calls → spawn_elfs nudge → write nudge
  (ask for plain text output) → TUI fallback write from streamBuf
- looksLikeAgentsMD + extractMarkdownDoc: validate and clean fallback
  content before writing (reject refusals, strip narrative preambles)
- Collapse thinking output to 3 lines; ctrl+o to expand (live stream
  and committed messages)
- Stream-level filter for model pseudo-tool-call blocks: suppresses
  <<tool_code>>...</tool_code>> and <<function_call>>...<tool_call|>
  from entering streamBuf across chunk boundaries
- sanitizeAssistantText regex covers both block formats
- Reset streamFilterClose at every turn start
2026-04-05 19:24:51 +02:00
vikingowl 11363f3b97 feat: M1-M7 gap audit phase 2 — security, TUI, context, router feedback
Gap 6 (M3): 7 new bash security checks (8-14)
- JQ injection, obfuscated flags (Unicode lookalike hyphens),
  /proc/environ access, brace expansion, Unicode whitespace,
  zsh dangerous constructs, comment-quote desync
- Total: 14 checks (was 7)

Gap 7 (M5): Model picker numbered selection
- /model shows numbered sorted list, /model 3 picks by number

Gap 8 (M5): /config set command
- /config set provider.default mistral writes to .gnoma/config.toml
- Whitelisted keys: provider.default, provider.model, permission.mode
- New config/write.go with TOML round-trip via BurntSushi/toml

Gap 9 (M6): Simple token estimator
- EstimateTokens (len/4 heuristic), EstimateMessages (content + overhead)
- PreEstimate on Tracker for proactive compaction triggering

Gap 10 (M7): Router quality feedback from elfs
- Router.Outcome + ReportOutcome (logs for now, M9 bandit uses later)
- Manager tracks armID/taskType per elf via elfMeta map
- Manager.ReportResult called after elf completion in both agent + batch tools
2026-04-04 11:07:08 +02:00
vikingowl de1798ff5c fix: M1-M7 gap audit phase 1 — bug fix + 5 quick wins
Bug fix:
- window.go: token ratio after compaction used len(w.messages) after
  reassignment, always producing ratio ~1.0. Fixed by saving original
  length before assignment.

Gap 1 (M3): Scanner patterns 13 → 47
- Added 34 new patterns: Azure, DigitalOcean, HuggingFace, Grafana,
  GitHub extended (app/oauth/refresh), Shopify, Twilio, SendGrid,
  NPM, PyPI, Databricks, Pulumi, Postman, Sentry, Anthropic admin,
  OpenAI extended, Vault, Supabase, Telegram, Discord, JWT, Heroku,
  Mailgun, Figma

Gap 2 (M3): Config security section
- SecuritySection with EntropyThreshold + custom PatternConfig
- Wire custom patterns from TOML into scanner at startup

Gap 3 (M4): Polling discovery loop
- StartDiscoveryLoop with 30s ticker, reconciles arms vs discovered
- Router.RemoveArm for disappeared local models

Gap 4 (M5): Incognito LocalOnly enforcement
- Router.SetLocalOnly filters non-local arms in Select()
- TUI incognito toggle (Ctrl+X, /incognito) sets local-only routing

Gap 5 (M6): Reactive 413 compaction
- Window.ForceCompact() bypasses ShouldCompact threshold
- Engine handles 413 with emergency compact + retry
2026-04-03 23:11:08 +02:00
vikingowl abb3e3ca90 feat: spawn_elfs batch tool for guaranteed parallel elf execution
New spawn_elfs tool takes array of tasks, spawns all elfs simultaneously.
Solves the problem of models (Mistral Small, Devstral) that serialize
tool calls instead of batching them.

Schema: {"tasks": [{"prompt": "...", "task_type": "..."}], "max_turns": 30}

Also:
- Suppress spawn_elfs tool output from chat (tree handles display)
- Update M7 milestones to reflect completed deliverables
- Add CC-inspired features to M8/M10: task notification system,
  task framework, /batch skill, coordinator mode, StreamingToolExecutor,
  git worktree isolation
2026-04-03 21:03:51 +02:00
vikingowl e1a47a7620 feat: rate limit pools, elf tree view, permission prompts, dep updates
Rate limits:
- Add PoolRPS/PoolTPM/PoolTokensMonth/PoolCostMonth pool kinds
- Provider defaults for Mistral/Anthropic/OpenAI/Google (tier-aware)
- Config override via [rate_limits.<provider>] TOML section
- Pools auto-attached to arms on registration

Elf tree view (CC-style):
- Structured elf.Progress type replaces flat string channel
- Tree with ├─/└─ branches, per-elf stats (tool uses, tokens)
- Live activity updates: tool calls, "generating… (N chars)"
- Completed elfs stay in tree with "Done (duration)" until turn ends
- Suppress raw elf output from chat (tree + LLM summary instead)
- Remove background elf mode (wait: false) — always wait
- Truncate elf results to 2000 chars for parent context
- Parallel hint in system prompt and tool description

Permission prompts:
- Show actual command in prompt: "bash wants to execute: find . -name '*.go'"
- Compact hint in separator bar: "⚠ bash: find . | wc -l [y/n]"
- PermReqMsg carries tool name + args

Other:
- Fix /model not updating status bar (session.Local.SetModel)
- Add make targets: run, check, install
- Update deps: BurntSushi/toml v1.6.0, chroma v2.23.1, x/text v0.35.0, cloud.google.com/go v0.123.0
2026-04-03 20:54:48 +02:00
vikingowl c01069164e feat: live elf progress in TUI
- Elf tool calls show as 🦉 [elf] <prompt> (not ⚙ [agent])
- Live 2-line progress beneath the elf label showing what the
  elf is currently outputting (grey, auto-updated)
- Agent tool forwards elf streaming events via progress channel
- Progress cleared on turn completion
- elfProgressCh wired from agent tool → TUI
2026-04-03 19:25:43 +02:00
vikingowl 13db7521b1 feat: M7 Elfs — sub-agents with router-integrated spawning
internal/elf/:
- BackgroundElf: runs on own goroutine with independent engine,
  history, and provider. No shared mutable state.
- Manager: spawns elfs via router.Select() (picks best arm per
  task type), tracks lifecycle, WaitAll(), CancelAll(), Cleanup().

internal/tool/agent/:
- Agent tool: LLM can call 'agent' to spawn sub-agents.
  Supports task_type hint for routing, wait/background mode.
  5-minute timeout, context cancellation propagated.

Concurrent tool execution:
- Read-only tools (fs.read, fs.grep, fs.glob, etc.) execute in
  parallel via goroutines.
- Write tools (bash, fs.write, fs.edit) execute sequentially.
- Partition by tool.IsReadOnly().

TUI: /elf command explains how to use sub-agents.
5 elf tests. Exit criteria: parent spawns 3 background elfs on
different providers, collects and synthesizes results.
2026-04-03 19:16:46 +02:00
vikingowl bb93317fb6 feat: ctrl+o toggles tool output expand, fix auto default
- ctrl+o toggles between 10-line truncated and full tool output
- Label shows "ctrl+o to expand" (lowercase)
- Fixed: auto permission mode now sticks — config default was
  overriding flag default ("default" → "auto" in config defaults)
2026-04-03 19:00:59 +02:00
vikingowl 60883521c7 feat: auto permission mode, edit diffs, truncated tool output
- Default permission mode changed to 'auto' (read-only auto-allows,
  writes prompt)
- fs.edit now shows diff-style output: line numbers, context ±3 lines,
  + for added (green), - for removed (red)
- Tool output truncated to 10 lines in TUI with "+N lines (Ctrl+O
  to expand)" indicator
- Mistral SDK bumped to v1.3.0
2026-04-03 18:57:13 +02:00
vikingowl 63f4c1389e feat: M6 complete — summarize strategy + tool result persistence
SummarizeStrategy: calls LLM to condense older messages into a
summary, preserving key decisions, file changes, tool outputs.
Falls back to truncation on failure. Keeps 6 recent messages.

Tool result persistence: outputs >50K chars saved to disk at
.gnoma/sessions/tool-results/{id}.txt with 2K preview inline.

TUI: /compact command for manual compaction, /clear now resets
engine history. Summarize strategy used by default (with
truncation fallback).
2026-04-03 18:51:28 +02:00
vikingowl 704f3a7302 feat: M6 context intelligence — token tracker + truncation compaction
internal/context/:
- Tracker: monitors token usage with OK/Warning/Critical states
  (thresholds from CC: 20K warning buffer, 13K autocompact buffer)
- TruncateStrategy: drops oldest messages, preserves system prompt +
  recent N turns, adds compaction boundary marker
- Window: manages message history with auto-compaction trigger,
  circuit breaker after 3 consecutive failures

Engine integration:
- Context window tracks usage per turn
- Auto-compacts when critical threshold reached
- History syncs with context window after compaction

TUI status bar:
- Token count with percentage (tokens: 1234 (5%))
- Color-coded: green=ok, yellow=warning, red=critical

Session Status extended: TokensMax, TokenPercent, TokenState.
7 context tests.
2026-04-03 18:46:03 +02:00
vikingowl 3bdd9cec7e fix: raw text during streaming, glamour only on completed messages 2026-04-03 18:36:27 +02:00
vikingowl 7c96f6a291 feat: markdown rendering in chat via glamour
Assistant responses now rendered with full markdown support:
- Headers, bold, italic, strikethrough
- Code blocks with syntax highlighting
- Lists (ordered + unordered)
- Links, blockquotes
- Tables

Uses charm.land/glamour/v2 with dark theme. Renderer width
updates on terminal resize. Live markdown rendering during
streaming. Consistent ◆ prefix + indent for multi-line output.
2026-04-03 18:32:56 +02:00
vikingowl 92d2921ea1 fix: consistent indentation and AI icon in chat
- ❯ flush left for user input, continuation lines indented 2 spaces
- ◆ purple icon for AI responses, continuation indented
- User multiline messages: ❯ first line, indented rest
- Tool output: indented under parent
- System messages: • prefix with multiline indent
- Input area: no extra padding, ❯ at column 0
2026-04-03 18:25:37 +02:00
vikingowl aaf3c1e42c fix: strings.Builder panic — use pointer to avoid copy-by-value 2026-04-03 18:14:01 +02:00
vikingowl 84d9e636b7 feat: multiline input with auto-expanding textarea
Switch from textinput to textarea bubble:
- Enter submits message
- Shift+Enter / Ctrl+J inserts newline
- Input area auto-expands from 1 to 10 lines based on content
- Line numbers hidden, prompt preserved
2026-04-03 18:11:34 +02:00
vikingowl 7a417b9011 feat: /config, /model listing, /shell stub, /help update
/model — lists all available arms from router (local + API),
  shows capabilities (tools, thinking, vision), marks current.
/config — shows resolved config (provider, model, permission,
  incognito, cwd, git branch, config file paths).
/shell — stub explaining feature is coming.
/help — updated with all commands + keyboard shortcuts.

Router passed to TUI config for arm listing.
2026-04-03 18:01:20 +02:00
vikingowl 9a78af7b05 feat: inject mode changes into engine conversation history
Engine.InjectMessage() appends messages to history without triggering
a turn. When permission mode or incognito changes, the notification
is injected as a user+assistant pair so the model sees it as context.

Fixes: model now knows permissions changed and will retry tool calls
instead of remembering old denials from previous mode.
2026-04-03 16:42:52 +02:00
vikingowl 0f8bf7dbdd fix: mode change messages tell model to retry
When permission mode changes (Shift+Tab or /permission), the system
message now says "previous tool denials no longer apply, retry if
asked" — helps the model understand it should re-attempt tools
instead of remembering old denials from conversation history.
2026-04-03 16:38:03 +02:00