gnoma

Author	SHA1	Message	Date
vikingowl	0d3d190a8b	fix(slm,session,router): classifier-only SLMs + session error recovery + feasibility diagnostics Three coupled fixes that surfaced from a single FunctionGemma test session where the SLM-as-execution-arm assumption broke down and every subsequent prompt failed with 'session not idle (state: error)'. (A) [slm].register_as_arm config. The SLM has always been unconditionally registered as both classifier AND tier-0 execution arm. Fine for general-purpose models (ministral, qwen3-chat); breaks for task-specialised models (FunctionGemma emits function-call syntax instead of prose; embedding models can't generate). New pointer-bool config: nil/absent preserves the historical default (true), explicit false makes the SLM classifier-only and the execution path skips the slm/* arm. Three table tests cover absent / explicit-false / explicit-true decode paths. (B) Session error recovery. After any routing or engine error, the session moved to StateError and stayed there until restart — every new user prompt got rejected with 'session not idle (state: error)'. ResetError() was already wired for the /init retry path, but the general user-input and slash-command paths didn't call it. Added ResetError() before every user-initiated Send in the TUI so a fresh prompt always represents intent-to-retry. The /init internal retry already had its own ResetError; left alone. (C) filterFeasible per-arm rejection logging. Today's 'no feasible arm for task X' error tells you THAT every arm was rejected but nothing about WHY. Added slog.Debug per rejection (arm, task, complexity, reason, the specific violated constraint) plus a summary line when zero arms are feasible at any quality. Visible with --verbose; quiet otherwise. Surface area expansion only — no behaviour change for users not chasing a bug.	2026-05-25 01:57:16 +02:00
vikingowl	f3c70bd802	fix(slm,router): honest classifier diagnostics + 15s default timeout Five fixes folded into one commit because they all answer the same question: 'why does my router stats output lie to me?' Issue 1 (timeout). Default classify timeout was 5s — too short for cold-start ollama loads on small models. Bumped to 15s and surfaced as [slm].classify_timeout (0 = built-in default). Empirically caught when a user's reecdev/tiny3.5:1.5b hit 'stream error: context deadline exceeded' on every single classify call. Issue 2 (Warn-level error). The SLM-fallback path logged the underlying error at Debug, invisible without --verbose. Promoted to Warn so a first-time misconfiguration surfaces immediately. The fallback itself is benign; the signal is that the SLM isn't doing the work it was supposed to. Issue 3 (stats hint). Hard-coded 'check that llamafile boots' even when the user is on ollama. Replaced with backend-templated advice read from cfg.SLM.Backend. Also distinguishes three diagnostic cases that were collapsed before: - SLM never called (zero attempts) - SLM called N times but every call fell back (timeout/parse) - SLM working but minority share Issue 4 (effective heuristic share). The classifier breakdown shows 'heuristic' and 'slm_fallback' as separate sources, but both routed through HeuristicClassifier — only the source tag differs. New line under 'total observations' surfaces the combined share honestly: 'effective heuristic share: 100% (44 fallbacks + 10 pure heuristic)'. Issue 5 (config schema). [slm].classify_timeout joins the existing [slm] knobs alongside startup_timeout. Documented inline with the cold-start-load rationale.	2026-05-25 01:05:57 +02:00
vikingowl	eea26a262e	feat(router): surface bandit knobs as [router.bandit] config Four hardcoded constants in the selector and feedback tracker are now user-tunable via [router.bandit]: - quality_alpha (EMA smoothing, default 0.3) - min_observations (samples before observed overrides heuristic, default 3) - observed_weight (observed/heuristic blend ratio, default 0.7) - strength_bonus (quality bonus for Strengths-tagged arms, default 0.15) Each field treats 0 as 'use default', so an empty TOML block is byte-identical to pre-config behaviour. BanditParams is plumbed via router.Config{Bandit: ...} and resolveBanditParams() centralises the fallback so every call site shares the same defaults. QualityTracker, scoreArm, bestScored, and selectBest signatures now take the configured values directly rather than reaching for package- level constants. Tests updated to pass BanditParams{} (defaults) or explicit overrides where they validate the new tuning paths. Tracks item #3 from the 'Bandit selector — design decisions deferred' TODO entry — ships independently of the EMA vs SLM strategic decision.	2026-05-24 22:42:34 +02:00
vikingowl	3eeb5b46d7	feat(safety): pre-launch cwd classifier + context banner Implements S-1 through S-7 of the startup-safety-banner plan. Adds a pre-launch safety check that classifies the current working directory into three tiers and gates the launch: TierRefuse /, /etc, /sys, /proc, /usr, /var, /bin, /sbin, /boot, /root, /dev (Linux) and /System, /Library, /private, /Applications (macOS). Refuses with exit 2 unless --dangerously-allow-anywhere is passed. TierWarn $HOME, ~/Desktop, ~/Downloads, ~/Documents, ~/.config, ~/.local, ~/.cache, /tmp, and similar dumping grounds. Prints a banner and reads a single y/Y from stdin to confirm; any other input (or EOF, including piped/ scripted invocation) aborts with exit 1. TierOK Anywhere with a recognized project marker (.gnoma/, go.mod, package.json, pyproject.toml, Cargo.toml, Makefile, Dockerfile, build.gradle, pom.xml) or inside a git repo. No prompt; banner only. Project markers and git-repo presence override the TierWarn check — a project dir inside $HOME stays TierOK. The require_project_marker config knob can flip that for strict users. Container detection: when /.dockerenv or /run/.containerenv exists, TierRefuse downgrades to TierWarn (devcontainers often chroot to / or similar). Best-effort; false positives only soften the gate. The context banner is always rendered (TierOK, TierWarn, TierRefuse alike) and summarizes: cwd, git branch + dirty state, project type, provider/model, modes (permission, incognito, prefer), and a top-level sensitive-file inventory. Inventory matches .env, .env., env.local; private-key extensions (.pem, .key, .crt, .p12, .pfx); SSH key names (id_rsa, id_ed25519, ...); credentials files; .netrc / .pgpass; KeePass vaults; and .ssh/ .aws/ .kube/ .gcloud/ .azure/ .docker/ directories. Precision-tested: .envrc and secret_handler.go do NOT match. Bounded at 1000 entries. Architecture: - internal/safety/cwd.go — Classification + symlink-resolving tier classifier with platform-specific roots and container detection. - internal/safety/sensitive.go — pattern-based top-level scanner, deterministic ordering, scanLimit guard against pathological dirs. - internal/safety/banner.go — pure render functions for the warn prefix, refuse message, and context banner. Safe for golden-string testing. - internal/config/config.go — new [safety] section with three config keys, defaults applied via ResolvedSafety() helper. Pointer fields distinguish "user omitted" from "user set to false." - cmd/gnoma/main.go — gate runs after subcommand dispatch (so `gnoma providers / profile / slm / router` skip the prompt) and before provider creation. --dangerously-allow-anywhere bypasses the gate with an explicit log warning. The runtime keypress reads up to 8 bytes from os.Stdin and accepts only "y" / "Y" trimmed; EOF returns false (piped invocations without the flag will abort). Documented in the readYesConfirmation helper. Manual smoke (per plan): - `cd / && gnoma -p test` → refuses - `cd ~ && gnoma` → warns + keypress - `cd ~/git/some-repo && gnoma` → banner only - subcommands skip the gate entirely Linux + macOS classification; Windows path handling deferred per plan (treated as TierOK there until follow-up). Refs: docs/superpowers/plans/2026-05-23-startup-safety-banner.md	2026-05-23 22:19:39 +02:00
vikingowl	f9094f68f3	feat(router): [router].prefer = local \| cloud \| auto Implements P-1 through P-6 of the prefer-routing-policy plan. Adds a config knob that biases routing toward local arms, cloud arms, or leaves selection unchanged. Default "auto" is byte-identical to pre-change behavior (the new armTier path with PreferAuto returns the same value as the old single-arg function). Mechanism diverged from the plan after empirical testing: The plan called for a score multiplier applied in bestScored. Tests revealed the existing cost-floor math (scoreArm divides by weighted cost which collapses to ~0.001 for free local arms) gives local arms a ~280x raw-score advantage that a 0.3-0.5 multiplier can't overcome. A tier-shift in armTier turned out cleaner: PreferLocal: cloud arms (true API, IsLocal=false && !IsCLIAgent) get +2 tier shift, landing behind locals. PreferCloud: IsLocal arms get +2 tier shift, landing behind cloud. SLM tier-0 arms shift to tier 2 — still below cloud's tier 3 — so the SLM-protection semantic (small stuff stays on the small model) survives PreferCloud. This matches the open question in the plan, now resolved as: yes, SLMs keep winning under PreferCloud by design. The policyMultiplier was kept in bestScored as a within-tier nudge (mostly cosmetic in practice given the cost-floor dynamics described above; could matter when costs are calibrated). Worth revisiting once router-wide cost calibration lands. Strengths cross-tier promotion is unaffected: the promoted-set path in selectBest bypasses armTier entirely, so a strongly-tagged cloud arm still wins SecurityReview tasks under PreferLocal (validated by TestPreferPolicy_StrengthsBeatsMultiplier). CLI-agent subprocess arms count as "local" for PreferLocal purposes — they proxy to cloud but the user-visible behavior is local. Users who want to exclude them can use --provider X. Forced arms (--provider X) and incognito take priority over the policy: forced arm test pins this, incognito-still-wins test pins the LocalOnly hard filter dominating PreferCloud. Test coverage (prefer_test.go): ParsePreferPolicy / String round trips; policyMultiplier table; acceptance scenarios across all three policies with adjacent-tier arms; SLM-still-wins under PreferCloud; Strengths beats multiplier; forced-arm bypass; incognito beats prefer; lone cloud arm wins when no local feasible. Refs: docs/superpowers/plans/2026-05-23-prefer-routing-policy.md	2026-05-23 22:13:26 +02:00
vikingowl	49d80cf847	feat(security): format-aware entropy safelist (Phase F-1) Add a deterministic pre-extractor that skips known-safe token shapes before they reach the entropy scorer. Targets the false-positive regime that bites under lowered entropy_threshold or redact_high_entropy = true — UUIDs (~3.4 bits), SHA hex digests (~3.9 bits), ISO-8601 timestamps, and HTTP(S) URLs. Config knob lives under the existing security section to match entropy_threshold / redact_high_entropy convention: [security] entropy_safelist = ["uuid", "sha_hex", "iso8601", "url"] Empty / unset preserves pre-F-1 behaviour exactly — users opt in. Per-pattern Debug telemetry fires on every skip (pattern name + token length, never the token bytes). This is the data F-2's go/no-go gate depends on; the plan literally specifies it. NewFirewall validates names at the config boundary and emits a Warn for unknown entries so a typo like "uid" instead of "uuid" surfaces loudly instead of silently disabling FP reduction. Tests cover: UUID/SHA-1/SHA-256 skipped at lowered threshold, mixed payload (safe shape + real secret) preserves the secret, secret-adjacent-to-UUID regression guard, empty safelist preserves pre-F-1 behaviour, unknown name silently dropped at scanner level but warned at firewall level, end-to-end FirewallConfig wiring, and the skip-telemetry log line. F-2 remains gated on real-workload FP-rate observations.	2026-05-22 12:39:10 +02:00
vikingowl	e38cce5f1f	fix(tui): security hardening, race-safety, and event handling fixes Bundles the pending TUI work into a coherent batch. Bug fixes from external review: * expandPlaceholders: single-pass alternation regex over the original input prevents `#p\d+` / `#img\d+` tokens inside pasted content from being re-expanded after the bracket form is inlined. * /incognito: gate savePromptHistory and the Ctrl+V image-write branch on `!m.incognito` so the no-persistence contract holds. * history.txt: write at mode 0600 (chmod existing 0644 files), create parent dir at 0700, truncate to 500 entries on every save, slog.Warn on errors instead of swallowing. * triggerPickerAction: guard m.config.Engine before SetModel, matching the /model handler. * Picker key handler: navigation/enter/q consume, escape/ctrl+c close the picker AND fall through to global handlers (so streaming cancel and double-tap quit work with an overlay open), default swallows stray input. * Paste line count: report total non-empty lines instead of newline count, ignoring trailing newlines (no more "+0 lines" for "abc"). * Ctrl+O restored to expand-output; Ctrl+Y is the new copy-response bind. /keys help text updated; picker help entries reordered. * Tighter perms on .gnoma/pasted_image_*.png (0600). Race-safety refactor: ApplyTheme used to mutate ~25 package-level lipgloss styles in place. Replaced with an immutable themeStyles snapshot and atomic.Pointer[themeStyles] swap. Readers go through a theme() helper (one atomic load) instead of touching package vars directly. No locks, no nested-RLock risk if rendering ever moves off-thread. Includes pre-existing in-flight work: TUISection in config with persistent theme/vim settings; /copy /theme /vim slash commands; provider-name completion; session.SetProvider for the provider picker. Tests: placeholder_test.go (6 regression + happy-path cases including the pasted-content collision), history_test.go (5 cases covering perms on new and existing files, on-disk truncation, blank-input, newline flattening), provider_test.go (provider switching + picker transitions + SLM gating).	2026-05-22 11:50:12 +02:00
vikingowl	244ecd97e5	fix: security hardening (bash redirection, unicode sanitization, edit tool resolver)	2026-05-21 23:29:48 +02:00
vikingowl	c4fde583f5	chore(lint): gofmt sweep + errcheck cleanups in router discovery Apply gofmt -w across the codebase (struct field comment realignment only — no semantic changes) and silence two errcheck warnings on fmt.Sscanf / fmt.Fprintf return values in internal/router/discovery with explicit `_, _ =` discards. Required so `make check` is green before tagging v0.1.0.	2026-05-20 03:13:05 +02:00
vikingowl	3c875276c9	feat(security): implement multi-wave audit remediation and agy provider support Implemented full security remediation following Universal Security Pilot protocol: - W1: Enforced SecureProvider at router and engine boundaries to prevent bypasses. - W1: Implemented path-sensitive policy for MCP tools. - W2: Added SHA256 hash verification for SLM downloads (llamafile). - W3: Enhanced secret redaction for private keys (full body) and high-entropy strings. - W4: Fixed symlink-based filesystem sandbox escapes in paths and grep. - W4: Documented CLI agent trust boundaries. Also added 'agy' (Antigravity) as a subprocess CLI provider with plain-text JSON schema support.	2026-05-20 01:13:13 +02:00
vikingowl	8450005b31	feat(cli): gnoma profile list/show subcommands (Phase C-2) `profile list` enumerates configured profiles and marks default + active. `profile show <name>` prints the merged effective config the profile would produce — sections, configured key names (values never), CLI agent overrides, arms, hooks, MCP servers, per-profile quality and session paths. Both commands work as a recovery affordance when profile resolution is broken: list flags a missing-default explicitly with "<name> (default, missing)", and the dispatcher falls back to a base-only load (new gnomacfg.LoadBase) so the diagnostics still run. API key values are filtered out of `profile show` — the output is safe to paste in a help channel or attach to a bug report.	2026-05-19 21:44:50 +02:00
vikingowl	635dad660c	feat(config): per-profile config layering with --profile flag (Phase C-1) Adds opt-in user profiles for swapping API keys, CLI binaries, and permission modes between contexts (work/private/experiment/...). Profile mode engages only when ~/.config/gnoma/profiles/ exists, so existing single-config installations are untouched. Selection order: --profile flag → default_profile in base config → fatal error. Layering: defaults → ~/.config/gnoma/config.toml → profiles/<name>.toml → <projectRoot>/.gnoma/config.toml → env. Map sections merge per-key; [[arms]] and [[mcp_servers]] merge by id/name; [[hooks]] appends. Per-profile data: quality-<name>.json and sessions/<name>/ keep the bandit and session list from cross-contaminating between profiles. Profile names restricted to [A-Za-z0-9_-] to block --profile=../foo path traversal into derived paths.	2026-05-19 21:35:33 +02:00
vikingowl	0aabd19906	feat(router): per-arm strengths + cost weight (Phase D) Plan D from docs/superpowers/plans/2026-05-19-post-slm-unlock.md (static portion; dynamic bandit-driven promotion deferred to D-2). Routing previously let tier ordering (CLI > local > API) dominate selection — Opus, in tier 3, would lose to a tier-1 CLI agent for SecurityReview even though Opus is empirically stronger at that task. This change introduces explicit per-arm overrides: [[arms]] id = "anthropic/claude-opus-4-7" strengths = ["security_review", "planning"] cost_weight = 0.3 Strengths gate cross-tier promotion: arms matching task.Type bypass the tier loop and compete with each other directly. Promotion is a preference, not a pin — if no strength-tagged arm is feasible (backoff, pool capacity, tool support), selection falls through to the default tier order. CostWeight linearly dampens the cost penalty in scoreArm via effectiveCost = 1 + CostWeight * (cost - 1) CostWeight=1.0 (or unset) preserves current behavior; lower values trade cheapness for quality. The earlier draft used cost^CostWeight which inverts direction for sub-1 local-arm costs (raising a fraction <1 to a fractional power makes it bigger, not smaller); a monotonicity regression test prevents that drift. - internal/router/arm.go: Strengths []TaskType, CostWeight float64, HasStrength(), ResolvedCostWeight() (zero → 1.0). - internal/router/selector.go: scoreArm strength bonus const (strengthScoreBonus = 0.15) + linear cost dampening; selectBest cross-tier promotion before tier loop. - internal/router/router.go: ArmOverride type + ApplyArmOverrides() returns unknown IDs; unknown strength names skipped with per-name warning via slog. - internal/router/task.go: ParseTaskTypeStrict() returns ok bool; ParseTaskType now delegates so the two switches stay in sync. - internal/config/config.go: ArmConfig + [[arms]] TOML wiring. - cmd/gnoma/main.go: applies overrides after all initial arms register; logs a warning when an [[arms]] id has no matching registered arm. Tests cover: predicate helpers, scoring direction across two arms, linear-formula monotonicity on both sides of cost=1, cross-tier promotion, empty-Strengths preserves tier order, promoted arm in backoff falls through via full Router.Select path, observed-quality tiebreak between two strength-tagged arms, ApplyArmOverrides happy path + unknown-ID reporting + unknown-strength skipping.	2026-05-19 21:14:45 +02:00
vikingowl	b331dcd61a	feat(subprocess): per-agent binary override via [cli_agents] config Plan B from docs/superpowers/plans/2026-05-19-post-slm-unlock.md. Users with aliased CLI binaries (claude-priv, claude-work, gemini-personal) can now point gnoma's auto-discovery at them without renaming. The override flows through to the actual subprocess spawn at internal/provider/subprocess/provider.go:56, so routing through the alias is functional, not cosmetic. Config: [cli_agents] claude = "claude-priv" # discovery uses claude-priv instead of claude gemini = "" # empty value = no override (fall back to canonical) # vibe is absent = canonical name used - internal/config/config.go: CLIAgentsSection map[string]string; TOML [cli_agents] key. - internal/provider/subprocess/agent.go: - Package-level lookPath = exec.LookPath for test injection. - resolveAgentBinary(canonical, override) → (path, binName, err). Override='' falls back to canonical. Override set but missing from PATH returns an error (no silent fallback — masks user typos). - DiscoveredAgent.OverrideBinary records the override binary name when one was used; empty otherwise. - DiscoverCLIAgents(ctx, overrides) signature; warning logged when an override is configured but the binary isn't on PATH. - cmd/gnoma/main.go: both call sites pass cfg.CLIAgents. The `gnoma providers` listing renders `claude-priv (via [cli_agents].claude)` when an override is in effect. Tests cover: 5 resolver cases (no override, override set, empty override falls back, override missing, canonical missing); 4 discovery cases (no overrides, override resolves alias, empty value falls back, override missing skips agent); 2 config round-trip cases.	2026-05-19 21:02:16 +02:00
vikingowl	43ea2e562d	feat(engine): two-stage tool routing for small local arms Plan A from docs/superpowers/plans/2026-05-19-post-slm-unlock.md. Small local SLMs (<=16k context) waste ~1500 tokens per turn on the full tool catalogue. Two-stage routing replaces round-1 tools with a single synthetic select_category schema; round-2+ sends only the selected category's real tool schemas plus select_category for re-selection. - internal/tool/category.go: Category type, optional Categorized interface, CategoryOf() with meta fallback. fs.read/fs.ls -> read, fs.write/fs.edit -> write, fs.glob/fs.grep -> search, bash -> exec. - internal/engine/twostage.go: synthetic select_category tool, intercept helper, per-turn selectedCategory state under e.mu. - Engine round 1 forces ToolChoiceRequired so SLMs don't fall back to prose. State resets at the top and end of every runLoop. - Activates automatically on a forced local arm with ContextWindow <=16384, or via [router].force_two_stage TOML key. - Integration test drives a 3-round trip and asserts: round 1 emits exactly one schema (synthetic) with ToolChoiceRequired, round 2 contains only write-category schemas + select_category, real fs.write executes. Invalid-category fallback round-trips back to round-1 mode.	2026-05-19 20:53:21 +02:00
vikingowl	a14fe8b504	feat(slm): pluggable backends + trivial-prompt routing The SLM had two intended jobs — classify every prompt and execute the small ones itself — but in practice three independent gates kept it out of nearly all real work: 1. llamafile cold-start blocked pipe-mode runs (always faster than the 15 s health check) 2. ClassifyTask defaulted RequiresTools=true, excluding the SLM arm (ToolUse=false) from 9/10 task types 3. armTier hard-coded CLI agents > local > API, so even when the SLM arm was feasible a CLI agent won Each gate is addressed below. The result is an SLM that actually does its job — small stuff stays local, complex stuff routes up — gated by arm capability rather than by accidents of the boot order. Backend layer (the bigger change) The original implementation hard-coded llamafile. That's fine if you have nothing else, but most users with a local model setup already run Ollama or llama.cpp. The new factory at internal/slm/backend.go picks between: - ollama (any local Ollama daemon) - llamacpp (any llama.cpp server) - llamafile (gnoma-managed, current behaviour) - openaicompat (LM Studio, vLLM, remote API) - auto (probes in order, picks first reachable) - disabled [slm].backend in config.toml selects which. Documented in docs/slm-backends.md with copy-paste presets for each. The factory probes the underlying model's actual capabilities (Ollama /api/show, llama.cpp /props) and sets the SLM arm's ToolUse accordingly — so the arm picks up simple file-read style tasks on tool-capable models and stays knowledge-only on completion-only models. Trivial-prompt heuristic (Gate 2) ClassifyTask now flips RequiresTools=false for short, low-complexity prompts whose task type doesn't imply existing code (Explain, Generation, Boilerplate). Tool-needing tokens (read, write, run, test, file, …) keep RequiresTools=true even when the prompt is brief. Complexity-aware tier ordering (Gate 3) armTier takes a Task and returns tier 0 for arms whose MaxComplexity ceiling fits the task. CLI agents drop to tier 1, local to 2, API to 3. For trivial tasks the SLM arm wins; for complex tasks the SLM falls out of the feasible set (MaxComplexity exclusion) and the original ordering reasserts. Eager boot with user-facing wait (Gate 1) Removed the original goroutine-only path. SLM startup now blocks synchronously inside the factory; for llamafile that means up to [slm].startup_timeout (default 5 s) of waiting on the first invocation, with "Starting SLM…" → "SLM ready (backend, model, tools, boot=N)" / "SLM unavailable: …" messages on stderr. Ollama / llamacpp backends boot instantly because the daemon is already running. waitHealthy() now respects the caller's context deadline instead of its old hardcoded 15 s ceiling. Classifier reliability Classifier timeout bumped 2 s → 5 s for thinking-mode models like Qwen3-distilled Tiny3.5. System prompt includes /no_think directive for the same family. These help but don't eliminate small-model JSON-contract failures — see the docs section on picking a model. Probe + telemetry surfaces gnoma slm status now prints the configured backend + model + a live probe result (✓/✗) instead of just the llamafile manifest state. `gnoma router stats` already (from the previous commit) shows the classifier-source mix; with this change you can finally see slm / slm_fallback / heuristic share rise from "always heuristic" to something reflecting real SLM activity. Tests - 9 new backend-factory tests (httptest-backed Ollama probe, error paths, auto-detection, capability flags) - Tier-ordering tests cover the new "specialised small arm wins trivial task" path - Trivial-prompt heuristic tested for both halves (knowledge-only flips RequiresTools=false; debug/file/run keeps it true) Deletes the dead SLMManager field from the TUI Config — it was declared but never read.	2026-05-19 18:53:32 +02:00
vikingowl	ec9433d783	chore(lint): clear remaining errcheck and staticcheck findings Brings the project to a clean `make lint` baseline (0 issues). Mechanical: - Wrap deferred resp.Body.Close() in closures (router/discovery.go, router/probe.go) so the unchecked return surfaces as `_ = ...`. - Apply `_ = ...` (single or multi-return blank) to test-file calls that intentionally ignore errors: os.MkdirAll / os.WriteFile / os.Chdir in setup paths, Close / Shutdown in teardown, Submit / Spawn / Send / LoadDir in tests that assert on side effects. Structural: - engine.handleRequestTooLarge drops the unused req parameter and rebuilds the request from compacted history (SA4009 — argument was overwritten before first use). - provider.ClassifyHTTPStatus and google.applyCapabilityOverrides switch to tagged switches over the discriminator (QF1002). - tui.app.go MouseWheel + inputMode and cmd/gnoma main slm-status use tagged switches in place of equality chains (QF1003). - cmd/gnoma main.go merges a var decl with its immediate assignment (S1021). - Three empty-branch sites (dispatcher_test, loader_test, coordinator_test) become real assertions or get the dead `if` removed (SA9003).	2026-05-19 17:53:42 +02:00
vikingowl	13b2f5e14d	chore(lint): clear dead code and tighten lifecycle errcheck Removes five unused funcs/vars/fields that golangci-lint had been flagging (anthropic.toolCallDoneEvent, mistral.translateMessages, hook.newError, subprocess.vibeParser.lastAssistantMsgID, tui.cBase), two ineffectual assignments (tui/rendering.go visible-window loop, subprocess stream_test setup), and a stale if/HasPrefix that's now a strings.TrimPrefix. Wires errcheck onto every subprocess / stream lifecycle path so a failed close or shutdown is at least logged rather than silently dropped: - engine/loop.go: stream.Close on both the error and success paths - mcp/manager.go: Shutdown when StartAll partial-fails; Transport close after Initialize failure - mcp/transport.go: stdin.Close + syscall.Kill on graceful-timeout fallback - slm/download.go: Close propagated as a named-return error on the success path; explicitly discarded on the rollback path - slm/classifier.go, slm/manager.go, hook/prompt.go, context/summarize.go, config/write.go, cmd/gnoma/main.go, tool/fs/grep.go: explicit ignores or error logging on Close / Shutdown / WalkDir / Scanln Production-code errcheck and ineffassign are now zero. Remaining golangci-lint output is test-only Close-in-defer noise plus stylistic staticcheck QF suggestions, left alone.	2026-05-19 17:05:54 +02:00
vikingowl	0a1730943f	fix: provider-agnostic startup + slm setup auto-config Remove the hardcoded mistral default so gnoma starts without any provider configured. TUI mode uses a stubProvider that lets CLI agent arms (claude, gemini, etc.) handle routing; pipe mode prints a clear setup message. Also: gnoma slm setup now auto-writes the default model_url to the global config when none is set, instead of erroring.	2026-05-07 17:05:06 +02:00
vikingowl	a9213ec382	feat(slm): Wave C — SLM classifier, MaxComplexity routing, CLI subcommands, TUI status - slm.Classifier: openaicompat → llamafile, 2s timeout + heuristic fallback, heuristic baseline blended so Priority/RequiredEffort are never zeroed, extractJSON strips markdown fences from small-model responses - router.ParseTaskType: case-insensitive string → TaskType, unknown → TaskGeneration - router.Arm.MaxComplexity: zero = no ceiling (preserves existing arm behavior); filterFeasible excludes arms when task.ComplexityScore > MaxComplexity - config.SLMSection: [slm] enabled / model_url / data_dir - openaicompat.NewLlamafile: no API key, model = "default", no retries - slm.Manager: DefaultDataDir() (XDG), Manifest() accessor - cmd/gnoma: `gnoma slm setup` / `gnoma slm status` subcommands; SLM arm registered with MaxComplexity=0.3 when enabled + set up - tui: /config shows slm status (ready/missing/not set up + base URL if running) - docs: roadmap updated to reflect llamafile pivot from Ollama	2026-05-07 16:44:32 +02:00
vikingowl	6bb9c33d04	fix(m8): replace_default map, error UX, benchmarks, and launch prep - Fix replace_default positional bug: []string → map[string]string for explicit MCP tool → built-in name mapping - Improve error messages for missing API keys (3 actionable options) and unknown providers (early validation with available list) - Remove python3 dependency from MCP tests (pure bash grep/sed parsing) - Add router benchmark scaffold (6 benchmarks in bench_test.go + docs) - Add .goreleaser.yml for cross-platform binary releases with ldflags - Add launch-ready README with quickstart, extensibility docs, GIF placeholder - Add CONTRIBUTING.md and Gitea issue templates (bug report, feature request)	2026-04-12 03:34:58 +02:00
vikingowl	6c47f8643b	feat(m8): MCP client, tool replaceability, and plugin system Complete the remaining M8 extensibility deliverables: - MCP client with JSON-RPC 2.0 over stdio transport, protocol lifecycle (initialize/tools-list/tools-call), and process group management for clean shutdown - MCP tool adapter implementing tool.Tool with mcp__{server}__{tool} naming convention and replace_default for swapping built-in tools - MCP manager for multi-server orchestration with parallel startup, tool discovery, and registry integration - Plugin system with plugin.json manifest (name/version/capabilities), directory-based discovery (global + project scopes with precedence), loader that merges skills/hooks/MCP configs into existing registries, and install/uninstall/list lifecycle manager - Config additions: MCPServerConfig, PluginsSection with opt-in/opt-out enabled/disabled resolution - TUI /plugins command for listing installed plugins - 54 tests across internal/mcp and internal/plugin packages	2026-04-12 03:09:05 +02:00
vikingowl	48c7b7aad4	feat(skill): pipe mode support and main.go wiring	2026-04-07 02:19:42 +02:00
vikingowl	1f620d2725	feat: hook config schema with user+project merge ordering	2026-04-07 00:50:53 +02:00
vikingowl	a7d86054de	feat: add Session config section (max_keep for session retention)	2026-04-05 23:37:10 +02:00
vikingowl	4f1e0cf567	feat: Ollama/gemma4 compat — /init flow, stream filter, safety fixes provider/openai: - Fix doubled tool call args (argsComplete flag): Ollama sends complete args in the first streaming chunk then repeats them as delta, causing doubled JSON and 400 errors in elfs - Handle fs: prefix (gemma4 uses fs:grep instead of fs.grep) - Add Reasoning field support for Ollama thinking output cmd/gnoma: - Early TTY detection so logger is created with correct destination before any component gets a reference to it (fixes slog WARN bleed into TUI textarea) permission: - Exempt spawn_elfs and agent tools from safety scanner: elf prompt text may legitimately mention .env/.ssh/credentials patterns and should not be blocked tui/app: - /init retry chain: no-tool-calls → spawn_elfs nudge → write nudge (ask for plain text output) → TUI fallback write from streamBuf - looksLikeAgentsMD + extractMarkdownDoc: validate and clean fallback content before writing (reject refusals, strip narrative preambles) - Collapse thinking output to 3 lines; ctrl+o to expand (live stream and committed messages) - Stream-level filter for model pseudo-tool-call blocks: suppresses <<tool_code>>...</tool_code>> and <<function_call>>...<tool_call\|> from entering streamBuf across chunk boundaries - sanitizeAssistantText regex covers both block formats - Reset streamFilterClose at every turn start	2026-04-05 19:24:51 +02:00
vikingowl	11363f3b97	feat: M1-M7 gap audit phase 2 — security, TUI, context, router feedback Gap 6 (M3): 7 new bash security checks (8-14) - JQ injection, obfuscated flags (Unicode lookalike hyphens), /proc/environ access, brace expansion, Unicode whitespace, zsh dangerous constructs, comment-quote desync - Total: 14 checks (was 7) Gap 7 (M5): Model picker numbered selection - /model shows numbered sorted list, /model 3 picks by number Gap 8 (M5): /config set command - /config set provider.default mistral writes to .gnoma/config.toml - Whitelisted keys: provider.default, provider.model, permission.mode - New config/write.go with TOML round-trip via BurntSushi/toml Gap 9 (M6): Simple token estimator - EstimateTokens (len/4 heuristic), EstimateMessages (content + overhead) - PreEstimate on Tracker for proactive compaction triggering Gap 10 (M7): Router quality feedback from elfs - Router.Outcome + ReportOutcome (logs for now, M9 bandit uses later) - Manager tracks armID/taskType per elf via elfMeta map - Manager.ReportResult called after elf completion in both agent + batch tools	2026-04-04 11:07:08 +02:00
vikingowl	de1798ff5c	fix: M1-M7 gap audit phase 1 — bug fix + 5 quick wins Bug fix: - window.go: token ratio after compaction used len(w.messages) after reassignment, always producing ratio ~1.0. Fixed by saving original length before assignment. Gap 1 (M3): Scanner patterns 13 → 47 - Added 34 new patterns: Azure, DigitalOcean, HuggingFace, Grafana, GitHub extended (app/oauth/refresh), Shopify, Twilio, SendGrid, NPM, PyPI, Databricks, Pulumi, Postman, Sentry, Anthropic admin, OpenAI extended, Vault, Supabase, Telegram, Discord, JWT, Heroku, Mailgun, Figma Gap 2 (M3): Config security section - SecuritySection with EntropyThreshold + custom PatternConfig - Wire custom patterns from TOML into scanner at startup Gap 3 (M4): Polling discovery loop - StartDiscoveryLoop with 30s ticker, reconciles arms vs discovered - Router.RemoveArm for disappeared local models Gap 4 (M5): Incognito LocalOnly enforcement - Router.SetLocalOnly filters non-local arms in Select() - TUI incognito toggle (Ctrl+X, /incognito) sets local-only routing Gap 5 (M6): Reactive 413 compaction - Window.ForceCompact() bypasses ShouldCompact threshold - Engine handles 413 with emergency compact + retry	2026-04-03 23:11:08 +02:00
vikingowl	e1a47a7620	feat: rate limit pools, elf tree view, permission prompts, dep updates Rate limits: - Add PoolRPS/PoolTPM/PoolTokensMonth/PoolCostMonth pool kinds - Provider defaults for Mistral/Anthropic/OpenAI/Google (tier-aware) - Config override via [rate_limits.<provider>] TOML section - Pools auto-attached to arms on registration Elf tree view (CC-style): - Structured elf.Progress type replaces flat string channel - Tree with ├─/└─ branches, per-elf stats (tool uses, tokens) - Live activity updates: tool calls, "generating… (N chars)" - Completed elfs stay in tree with "Done (duration)" until turn ends - Suppress raw elf output from chat (tree + LLM summary instead) - Remove background elf mode (wait: false) — always wait - Truncate elf results to 2000 chars for parent context - Parallel hint in system prompt and tool description Permission prompts: - Show actual command in prompt: "bash wants to execute: find . -name '*.go'" - Compact hint in separator bar: "⚠ bash: find . \| wc -l [y/n]" - PermReqMsg carries tool name + args Other: - Fix /model not updating status bar (session.Local.SetModel) - Add make targets: run, check, install - Update deps: BurntSushi/toml v1.6.0, chroma v2.23.1, x/text v0.35.0, cloud.google.com/go v0.123.0	2026-04-03 20:54:48 +02:00
vikingowl	bb93317fb6	feat: ctrl+o toggles tool output expand, fix auto default - ctrl+o toggles between 10-line truncated and full tool output - Label shows "ctrl+o to expand" (lowercase) - Fixed: auto permission mode now sticks — config default was overriding flag default ("default" → "auto" in config defaults)	2026-04-03 19:00:59 +02:00
vikingowl	86aabd4946	feat: wire config system into main — TOML config now active config.Load() called at startup. Layered: defaults → global (~/.config/gnoma/config.toml) → project (.gnoma/config.toml) → env vars. CLI flags override config values. Config drives: - provider.default + provider.model as defaults - provider.api_keys for key resolution - provider.endpoints for custom base URLs - permission.mode + permission.rules loaded into checker - tools.bash_timeout passed to bash tool Example .gnoma/config.toml: [provider] default = "ollama" model = "qwen3:14b" [permission] mode = "bypass" [[permission.rules]] tool = "bash" pattern = "rm -rf" action = "deny"	2026-04-03 17:38:58 +02:00
vikingowl	291ed35341	feat: add TOML config system with layered loading Layers: defaults → ~/.config/gnoma/config.toml → .gnoma/config.toml → environment variables. Supports ${VAR} references in API keys, GNOMA_PROVIDER/GNOMA_MODEL env overrides, alternative env var names (ANTHROPICS_API_KEY, GOOGLE_API_KEY). Custom Duration type for TOML string parsing. 6 tests.	2026-04-03 13:51:03 +02:00

32 Commits