gnoma

Author	SHA1	Message	Date
vikingowl	a23eb6b92c	style: gofmt drift from prior commits Pure whitespace cleanup surfaced when 'make check' ran gofmt over the tree. Mostly struct-field column alignment in internal/safety/banner.go (SessionInfo) and the var(...) flag block in cmd/gnoma/main.go after --dangerously-allow-anywhere was added without realignment. Verified zero substantive changes via 'git diff --ignore-all-space --ignore-blank-lines'.	2026-05-24 16:33:17 +02:00
vikingowl	847cd5fe0c	fix(security): use crypto/rand for session-ID suffix Semgrep flagged math/rand for the /tmp artifact-directory session-ID generation. Modern Go (1.20+) auto-seeds the global math/rand source so this wasn't exploitable in practice, but crypto/rand is the idiomatic choice for any security-adjacent identifier and removes the finding from future security audits. Drops the mrand alias entirely; reads 8 random bytes once and masks to 24 bits to preserve the existing %06x suffix format.	2026-05-24 16:22:50 +02:00
vikingowl	8ba77c1685	fix(safety): env-template precision, label alignment, banner on bypass Three polish items surfaced during the maintainer's manual smoke of the previous safety commit. env-template precision (false-positive fix): The "env file" rule matched .env.* universally, which flagged conventional templates like .env.example / .env.sample / .env.template / .env.dist / .env.default — these hold variable NAMES, no values, and are commonly committed. Now skipped. Real env files (.env, .env.local, .env.production) still match. New envTemplateSuffixes table + isEnvTemplate helper; check runs only inside the env-file rule so the suffix denylist is scoped. Tests added for both directions: 6 templates that must NOT flag, 6 real env files that must. Banner label alignment: Field labels were padded to 8 chars except "sensitive" at 9, producing visible misalignment in the rendered banner: cwd : /... provider : ollama / ... sensitive : 0 matches in cwd <- one extra space Padded all labels to 9 chars so the ":" separators line up. Context banner on bypass: --dangerously-allow-anywhere previously suppressed the entire safety block, including the informational context banner. Bypassing the GATE is not the same as opting out of the info — the user still wants to see cwd / git state / sensitive files nearby. Restructured the safety block so classification + banner always run; the bypass only skips the refuse/warn FLOW. The bypass warning log now also includes the classified tier and cwd path for diagnostics.	2026-05-23 22:32:26 +02:00
vikingowl	3eeb5b46d7	feat(safety): pre-launch cwd classifier + context banner Implements S-1 through S-7 of the startup-safety-banner plan. Adds a pre-launch safety check that classifies the current working directory into three tiers and gates the launch: TierRefuse /, /etc, /sys, /proc, /usr, /var, /bin, /sbin, /boot, /root, /dev (Linux) and /System, /Library, /private, /Applications (macOS). Refuses with exit 2 unless --dangerously-allow-anywhere is passed. TierWarn $HOME, ~/Desktop, ~/Downloads, ~/Documents, ~/.config, ~/.local, ~/.cache, /tmp, and similar dumping grounds. Prints a banner and reads a single y/Y from stdin to confirm; any other input (or EOF, including piped/ scripted invocation) aborts with exit 1. TierOK Anywhere with a recognized project marker (.gnoma/, go.mod, package.json, pyproject.toml, Cargo.toml, Makefile, Dockerfile, build.gradle, pom.xml) or inside a git repo. No prompt; banner only. Project markers and git-repo presence override the TierWarn check — a project dir inside $HOME stays TierOK. The require_project_marker config knob can flip that for strict users. Container detection: when /.dockerenv or /run/.containerenv exists, TierRefuse downgrades to TierWarn (devcontainers often chroot to / or similar). Best-effort; false positives only soften the gate. The context banner is always rendered (TierOK, TierWarn, TierRefuse alike) and summarizes: cwd, git branch + dirty state, project type, provider/model, modes (permission, incognito, prefer), and a top-level sensitive-file inventory. Inventory matches .env, .env., env.local; private-key extensions (.pem, .key, .crt, .p12, .pfx); SSH key names (id_rsa, id_ed25519, ...); credentials files; .netrc / .pgpass; KeePass vaults; and .ssh/ .aws/ .kube/ .gcloud/ .azure/ .docker/ directories. Precision-tested: .envrc and secret_handler.go do NOT match. Bounded at 1000 entries. Architecture: - internal/safety/cwd.go — Classification + symlink-resolving tier classifier with platform-specific roots and container detection. - internal/safety/sensitive.go — pattern-based top-level scanner, deterministic ordering, scanLimit guard against pathological dirs. - internal/safety/banner.go — pure render functions for the warn prefix, refuse message, and context banner. Safe for golden-string testing. - internal/config/config.go — new [safety] section with three config keys, defaults applied via ResolvedSafety() helper. Pointer fields distinguish "user omitted" from "user set to false." - cmd/gnoma/main.go — gate runs after subcommand dispatch (so `gnoma providers / profile / slm / router` skip the prompt) and before provider creation. --dangerously-allow-anywhere bypasses the gate with an explicit log warning. The runtime keypress reads up to 8 bytes from os.Stdin and accepts only "y" / "Y" trimmed; EOF returns false (piped invocations without the flag will abort). Documented in the readYesConfirmation helper. Manual smoke (per plan): - `cd / && gnoma -p test` → refuses - `cd ~ && gnoma` → warns + keypress - `cd ~/git/some-repo && gnoma` → banner only - subcommands skip the gate entirely Linux + macOS classification; Windows path handling deferred per plan (treated as TierOK there until follow-up). Refs: docs/superpowers/plans/2026-05-23-startup-safety-banner.md	2026-05-23 22:19:39 +02:00
vikingowl	f9094f68f3	feat(router): [router].prefer = local \| cloud \| auto Implements P-1 through P-6 of the prefer-routing-policy plan. Adds a config knob that biases routing toward local arms, cloud arms, or leaves selection unchanged. Default "auto" is byte-identical to pre-change behavior (the new armTier path with PreferAuto returns the same value as the old single-arg function). Mechanism diverged from the plan after empirical testing: The plan called for a score multiplier applied in bestScored. Tests revealed the existing cost-floor math (scoreArm divides by weighted cost which collapses to ~0.001 for free local arms) gives local arms a ~280x raw-score advantage that a 0.3-0.5 multiplier can't overcome. A tier-shift in armTier turned out cleaner: PreferLocal: cloud arms (true API, IsLocal=false && !IsCLIAgent) get +2 tier shift, landing behind locals. PreferCloud: IsLocal arms get +2 tier shift, landing behind cloud. SLM tier-0 arms shift to tier 2 — still below cloud's tier 3 — so the SLM-protection semantic (small stuff stays on the small model) survives PreferCloud. This matches the open question in the plan, now resolved as: yes, SLMs keep winning under PreferCloud by design. The policyMultiplier was kept in bestScored as a within-tier nudge (mostly cosmetic in practice given the cost-floor dynamics described above; could matter when costs are calibrated). Worth revisiting once router-wide cost calibration lands. Strengths cross-tier promotion is unaffected: the promoted-set path in selectBest bypasses armTier entirely, so a strongly-tagged cloud arm still wins SecurityReview tasks under PreferLocal (validated by TestPreferPolicy_StrengthsBeatsMultiplier). CLI-agent subprocess arms count as "local" for PreferLocal purposes — they proxy to cloud but the user-visible behavior is local. Users who want to exclude them can use --provider X. Forced arms (--provider X) and incognito take priority over the policy: forced arm test pins this, incognito-still-wins test pins the LocalOnly hard filter dominating PreferCloud. Test coverage (prefer_test.go): ParsePreferPolicy / String round trips; policyMultiplier table; acceptance scenarios across all three policies with adjacent-tier arms; SLM-still-wins under PreferCloud; Strengths beats multiplier; forced-arm bypass; incognito beats prefer; lone cloud arm wins when no local feasible. Refs: docs/superpowers/plans/2026-05-23-prefer-routing-policy.md	2026-05-23 22:13:26 +02:00
vikingowl	49d80cf847	feat(security): format-aware entropy safelist (Phase F-1) Add a deterministic pre-extractor that skips known-safe token shapes before they reach the entropy scorer. Targets the false-positive regime that bites under lowered entropy_threshold or redact_high_entropy = true — UUIDs (~3.4 bits), SHA hex digests (~3.9 bits), ISO-8601 timestamps, and HTTP(S) URLs. Config knob lives under the existing security section to match entropy_threshold / redact_high_entropy convention: [security] entropy_safelist = ["uuid", "sha_hex", "iso8601", "url"] Empty / unset preserves pre-F-1 behaviour exactly — users opt in. Per-pattern Debug telemetry fires on every skip (pattern name + token length, never the token bytes). This is the data F-2's go/no-go gate depends on; the plan literally specifies it. NewFirewall validates names at the config boundary and emits a Warn for unknown entries so a typo like "uid" instead of "uuid" surfaces loudly instead of silently disabling FP reduction. Tests cover: UUID/SHA-1/SHA-256 skipped at lowered threshold, mixed payload (safe shape + real secret) preserves the secret, secret-adjacent-to-UUID regression guard, empty safelist preserves pre-F-1 behaviour, unknown name silently dropped at scanner level but warned at firewall level, end-to-end FirewallConfig wiring, and the skip-telemetry log line. F-2 remains gated on real-workload FP-rate observations.	2026-05-22 12:39:10 +02:00
vikingowl	e38cce5f1f	fix(tui): security hardening, race-safety, and event handling fixes Bundles the pending TUI work into a coherent batch. Bug fixes from external review: * expandPlaceholders: single-pass alternation regex over the original input prevents `#p\d+` / `#img\d+` tokens inside pasted content from being re-expanded after the bracket form is inlined. * /incognito: gate savePromptHistory and the Ctrl+V image-write branch on `!m.incognito` so the no-persistence contract holds. * history.txt: write at mode 0600 (chmod existing 0644 files), create parent dir at 0700, truncate to 500 entries on every save, slog.Warn on errors instead of swallowing. * triggerPickerAction: guard m.config.Engine before SetModel, matching the /model handler. * Picker key handler: navigation/enter/q consume, escape/ctrl+c close the picker AND fall through to global handlers (so streaming cancel and double-tap quit work with an overlay open), default swallows stray input. * Paste line count: report total non-empty lines instead of newline count, ignoring trailing newlines (no more "+0 lines" for "abc"). * Ctrl+O restored to expand-output; Ctrl+Y is the new copy-response bind. /keys help text updated; picker help entries reordered. * Tighter perms on .gnoma/pasted_image_*.png (0600). Race-safety refactor: ApplyTheme used to mutate ~25 package-level lipgloss styles in place. Replaced with an immutable themeStyles snapshot and atomic.Pointer[themeStyles] swap. Readers go through a theme() helper (one atomic load) instead of touching package vars directly. No locks, no nested-RLock risk if rendering ever moves off-thread. Includes pre-existing in-flight work: TUISection in config with persistent theme/vim settings; /copy /theme /vim slash commands; provider-name completion; session.SetProvider for the provider picker. Tests: placeholder_test.go (6 regression + happy-path cases including the pasted-content collision), history_test.go (5 cases covering perms on new and existing files, on-disk truncation, blank-input, newline flattening), provider_test.go (provider switching + picker transitions + SLM gating).	2026-05-22 11:50:12 +02:00
vikingowl	99fa0ff08e	refactor(providers): refresh defaults to current 2026 model lineup Bump hard-coded provider defaults to the May 2026 lineup: - Anthropic: claude-sonnet-4-6 (default); Opus 4.7 and Haiku 4.5 in the fallback list. 4.6/4.7 generation has 1M context standard. - OpenAI: gpt-5.5 (default); 5.5-pro / 5.2 / 5.2-chat-latest in fallback. ThinkingModes now baseline on GPT-5.x. - Google: gemini-3.5-flash (default); 3.1 Pro / Flash Lite in fallback. - Mistral: mistral-large-latest unchanged (Mistral Large 3); add mistral-medium-3.5, mistral-medium-2511, mistral-large-2512 to the rate-limit map. Legacy dated IDs retained in fallback lists and ratelimits maps so configs pinned to claude-sonnet-4-20250514 / gpt-4o / gemini-2.5-flash keep resolving. Capability tables (ContextWindow, MaxOutput, ThinkingModes) updated to match each generation. CLI help text in cmd/gnoma/main.go also updated.	2026-05-20 03:13:21 +02:00
vikingowl	c4fde583f5	chore(lint): gofmt sweep + errcheck cleanups in router discovery Apply gofmt -w across the codebase (struct field comment realignment only — no semantic changes) and silence two errcheck warnings on fmt.Sscanf / fmt.Fprintf return values in internal/router/discovery with explicit `_, _ =` discards. Required so `make check` is green before tagging v0.1.0.	2026-05-20 03:13:05 +02:00
vikingowl	6322d10686	test: fix compilation errors in main and mcp tests after secure provider refactor	2026-05-20 01:21:52 +02:00
vikingowl	3c875276c9	feat(security): implement multi-wave audit remediation and agy provider support Implemented full security remediation following Universal Security Pilot protocol: - W1: Enforced SecureProvider at router and engine boundaries to prevent bypasses. - W1: Implemented path-sensitive policy for MCP tools. - W2: Added SHA256 hash verification for SLM downloads (llamafile). - W3: Enhanced secret redaction for private keys (full body) and high-entropy strings. - W4: Fixed symlink-based filesystem sandbox escapes in paths and grep. - W4: Documented CLI agent trust boundaries. Also added 'agy' (Antigravity) as a subprocess CLI provider with plain-text JSON schema support.	2026-05-20 01:13:13 +02:00
vikingowl	34f6f1c786	feat(security): incognito coherence across firewall/router/persist (Wave 2) Closes the cluster of audit findings where gnoma's incognito promise ('no persistence, no learning, local-only routing') silently broke because state was duplicated across the CLI flag, the firewall's IncognitoMode, the router's localOnly flag, and the TUI's local m.incognito field. Wave 2 makes security.IncognitoMode the canonical source of truth. W2-1 Router.Select rejects forced non-local arms when localOnly is on rather than short-circuiting and silently routing to cloud. Main fails fast when --incognito + --provider <cloud> are combined; the TUI toggle (Ctrl+X, /incognito, config panel) refuses with an actionable message when a non-local arm is pinned. Factored the three duplicated toggle sites into Model.attemptIncognitoToggle. W2-2 persist.Store.Save consults an IncognitoGate (local interface, security.IncognitoMode satisfies it). nil gate = always persist (legacy behaviour for tests); non-nil gate is consulted on every Save so TUI runtime toggles take effect without reconstructing the store. File mode 0o600, dir mode 0o700. W2-3 tui.New seeds m.incognito from cfg.Firewall.Incognito().Active(). Fixes the Ctrl+X-on-launch-with-incognito case where the first toggle silently turned the firewall OFF because the local flag started false out of sync with the firewall. W2-4 saveQuality gates on both incognito (defensive, covers the window before fwRef.Set fires) and fw.Incognito().ShouldLearn() (so TUI Ctrl+X suppresses the snapshot on exit). Quality restore skipped under --incognito. Quality file written 0o600 in dir 0o700. engine.reportOutcome and elf.Manager.ReportResult both gate on fw.Incognito().ShouldLearn() — bandit signal no longer leaks out of incognito sessions. W2-5 session files written 0o600 in dirs 0o700 (was 0o644 / 0o755). W2-6 IncognitoMode.LocalOnly dropped — dead field with no readers; routing local-only state lives on the router, not the firewall. Also wires rtr.SetLocalOnly(true) when --incognito at launch — main previously activated the firewall's flag but never told the router to filter, so even without the forced-arm bug, launching with --incognito alone gave you 'incognito badge but full arm pool'.	2026-05-19 22:57:36 +02:00
vikingowl	d6614545a9	feat(security): wrap engine.Config.Provider + SetProvider doc (W1 follow-up) Advisor flagged that engine.Config.Provider stayed raw, so the safety property was 'every call goes through buildRequest' instead of the stronger 'every Stream call routes through a SafeProvider.' Wrap it even though buildRequest still scans inline — at worst this costs one extra idempotent scan pass; it removes the 'someone adds a fifth engine Stream site that skips buildRequest' failure mode. Engine.SetProvider gets a doc comment establishing the wrap contract for callers. No active callers today, but documenting it now prevents the future bypass. Confirmed elf engines inherit the wrap automatically: - elf.Manager.Spawn passes arm.Provider (already *SafeProvider after W1-3a) - elf.Manager.SpawnWithProvider has no callers — dead code path Added the Wave 1 plan to TODO.md under active plans.	2026-05-19 22:37:24 +02:00
vikingowl	dc084d5a82	feat(security): wire SafeProvider into all provider sites (W1-2/3/4) Construct security.FirewallRef early in main() and Set it immediately after security.NewFirewall returns. Wrap every provider that may be called outside engine.buildRequest(): - primary provider arm (limitedProvider) - discovered local models (RegisterDiscoveredModels factory) - CLI agent arms (subprocprov.New) - background-discovery factory (StartDiscoveryLoop) - SLM arm + classifier transport - summarizer (gnomactx.NewSummarizeStrategy) routerStreamer and hook PromptExecutor inherit redaction automatically once every router arm is wrapped — they dispatch through router.Stream → arm.Provider.Stream. engine.Config.Provider stays raw because the engine still scans inline at buildRequest(); per the Wave 1 plan, removing that scan is deferred one release as belt-and-suspenders. Integration tests in internal/security/integration_test.go verify the boundary end-to-end: a router arm wrapped with WrapProvider redacts an 'sk-ant-...' literal before the inner provider sees it, and the pre-Set / post-Set transition works as documented (pass-through until the FirewallRef has a Firewall installed).	2026-05-19 22:33:24 +02:00
vikingowl	d84b295da2	feat(tui): /profile slash command + status-bar profile badge (Phase C-3) Adds the in-TUI surface for the profile system: - Status bar carries " · profile: <name>" next to the SLM badge when profile mode is engaged (renders nothing in legacy single-config installations). - /profile (no args) shows the active profile and lists available ones. - /profile <name> switches by re-executing gnoma via syscall.Exec under --profile <name>. Critical cleanups (quality.json snapshot, SLM backend Close, session.Close) fire explicitly before exec since defers don't run after exec replaces the process image. Using syscall.Exec rather than a child process avoids stacking a process level on every switch and propagates the new gnoma's exit code directly to the shell. - Autocomplete after "/profile " offers configured profile names; the completion source is threaded from main.go via tui.Config. Conversation history is not preserved across a switch — profile change implies different context, different keys, different permission mode, so a clean reset is the correct semantic.	2026-05-19 21:59:11 +02:00
vikingowl	8450005b31	feat(cli): gnoma profile list/show subcommands (Phase C-2) `profile list` enumerates configured profiles and marks default + active. `profile show <name>` prints the merged effective config the profile would produce — sections, configured key names (values never), CLI agent overrides, arms, hooks, MCP servers, per-profile quality and session paths. Both commands work as a recovery affordance when profile resolution is broken: list flags a missing-default explicitly with "<name> (default, missing)", and the dispatcher falls back to a base-only load (new gnomacfg.LoadBase) so the diagnostics still run. API key values are filtered out of `profile show` — the output is safe to paste in a help channel or attach to a bug report.	2026-05-19 21:44:50 +02:00
vikingowl	635dad660c	feat(config): per-profile config layering with --profile flag (Phase C-1) Adds opt-in user profiles for swapping API keys, CLI binaries, and permission modes between contexts (work/private/experiment/...). Profile mode engages only when ~/.config/gnoma/profiles/ exists, so existing single-config installations are untouched. Selection order: --profile flag → default_profile in base config → fatal error. Layering: defaults → ~/.config/gnoma/config.toml → profiles/<name>.toml → <projectRoot>/.gnoma/config.toml → env. Map sections merge per-key; [[arms]] and [[mcp_servers]] merge by id/name; [[hooks]] appends. Per-profile data: quality-<name>.json and sessions/<name>/ keep the bandit and session list from cross-contaminating between profiles. Profile names restricted to [A-Za-z0-9_-] to block --profile=../foo path traversal into derived paths.	2026-05-19 21:35:33 +02:00
vikingowl	0aabd19906	feat(router): per-arm strengths + cost weight (Phase D) Plan D from docs/superpowers/plans/2026-05-19-post-slm-unlock.md (static portion; dynamic bandit-driven promotion deferred to D-2). Routing previously let tier ordering (CLI > local > API) dominate selection — Opus, in tier 3, would lose to a tier-1 CLI agent for SecurityReview even though Opus is empirically stronger at that task. This change introduces explicit per-arm overrides: [[arms]] id = "anthropic/claude-opus-4-7" strengths = ["security_review", "planning"] cost_weight = 0.3 Strengths gate cross-tier promotion: arms matching task.Type bypass the tier loop and compete with each other directly. Promotion is a preference, not a pin — if no strength-tagged arm is feasible (backoff, pool capacity, tool support), selection falls through to the default tier order. CostWeight linearly dampens the cost penalty in scoreArm via effectiveCost = 1 + CostWeight * (cost - 1) CostWeight=1.0 (or unset) preserves current behavior; lower values trade cheapness for quality. The earlier draft used cost^CostWeight which inverts direction for sub-1 local-arm costs (raising a fraction <1 to a fractional power makes it bigger, not smaller); a monotonicity regression test prevents that drift. - internal/router/arm.go: Strengths []TaskType, CostWeight float64, HasStrength(), ResolvedCostWeight() (zero → 1.0). - internal/router/selector.go: scoreArm strength bonus const (strengthScoreBonus = 0.15) + linear cost dampening; selectBest cross-tier promotion before tier loop. - internal/router/router.go: ArmOverride type + ApplyArmOverrides() returns unknown IDs; unknown strength names skipped with per-name warning via slog. - internal/router/task.go: ParseTaskTypeStrict() returns ok bool; ParseTaskType now delegates so the two switches stay in sync. - internal/config/config.go: ArmConfig + [[arms]] TOML wiring. - cmd/gnoma/main.go: applies overrides after all initial arms register; logs a warning when an [[arms]] id has no matching registered arm. Tests cover: predicate helpers, scoring direction across two arms, linear-formula monotonicity on both sides of cost=1, cross-tier promotion, empty-Strengths preserves tier order, promoted arm in backoff falls through via full Router.Select path, observed-quality tiebreak between two strength-tagged arms, ApplyArmOverrides happy path + unknown-ID reporting + unknown-strength skipping.	2026-05-19 21:14:45 +02:00
vikingowl	b331dcd61a	feat(subprocess): per-agent binary override via [cli_agents] config Plan B from docs/superpowers/plans/2026-05-19-post-slm-unlock.md. Users with aliased CLI binaries (claude-priv, claude-work, gemini-personal) can now point gnoma's auto-discovery at them without renaming. The override flows through to the actual subprocess spawn at internal/provider/subprocess/provider.go:56, so routing through the alias is functional, not cosmetic. Config: [cli_agents] claude = "claude-priv" # discovery uses claude-priv instead of claude gemini = "" # empty value = no override (fall back to canonical) # vibe is absent = canonical name used - internal/config/config.go: CLIAgentsSection map[string]string; TOML [cli_agents] key. - internal/provider/subprocess/agent.go: - Package-level lookPath = exec.LookPath for test injection. - resolveAgentBinary(canonical, override) → (path, binName, err). Override='' falls back to canonical. Override set but missing from PATH returns an error (no silent fallback — masks user typos). - DiscoveredAgent.OverrideBinary records the override binary name when one was used; empty otherwise. - DiscoverCLIAgents(ctx, overrides) signature; warning logged when an override is configured but the binary isn't on PATH. - cmd/gnoma/main.go: both call sites pass cfg.CLIAgents. The `gnoma providers` listing renders `claude-priv (via [cli_agents].claude)` when an override is in effect. Tests cover: 5 resolver cases (no override, override set, empty override falls back, override missing, canonical missing); 4 discovery cases (no overrides, override resolves alias, empty value falls back, override missing skips agent); 2 config round-trip cases.	2026-05-19 21:02:16 +02:00
vikingowl	43ea2e562d	feat(engine): two-stage tool routing for small local arms Plan A from docs/superpowers/plans/2026-05-19-post-slm-unlock.md. Small local SLMs (<=16k context) waste ~1500 tokens per turn on the full tool catalogue. Two-stage routing replaces round-1 tools with a single synthetic select_category schema; round-2+ sends only the selected category's real tool schemas plus select_category for re-selection. - internal/tool/category.go: Category type, optional Categorized interface, CategoryOf() with meta fallback. fs.read/fs.ls -> read, fs.write/fs.edit -> write, fs.glob/fs.grep -> search, bash -> exec. - internal/engine/twostage.go: synthetic select_category tool, intercept helper, per-turn selectedCategory state under e.mu. - Engine round 1 forces ToolChoiceRequired so SLMs don't fall back to prose. State resets at the top and end of every runLoop. - Activates automatically on a forced local arm with ContextWindow <=16384, or via [router].force_two_stage TOML key. - Integration test drives a 3-round trip and asserts: round 1 emits exactly one schema (synthetic) with ToolChoiceRequired, round 2 contains only write-category schemas + select_category, real fs.write executes. Invalid-category fallback round-trips back to round-1 mode.	2026-05-19 20:53:21 +02:00
vikingowl	eb0583f606	fix(router): unpin config-default provider + complexity floor by task type Two routing bugs were keeping the SLM out of every real prompt and, once it was eligible, pulling complex tasks into it as well. Bug 1: ForceArm was called unconditionally when a primary provider was configured (cmd/gnoma/main.go:378). That short-circuited the entire router — every prompt went straight to whatever was set as [provider].default, regardless of tier, score, or feasibility. The SLM arm appeared in `gnoma router stats` registration logs but had zero observations after dozens of prompts. Fix: only pin when the user passed --provider on the command line. Config defaults register the arm but don't force it; the router picks freely. Verified end-to-end — trivial prompts now reach slm/ollama via the tier-0 priority. Bug 2: A short prompt like "refactor the SLM module" classifies as TaskRefactor with complexity 0.015 — well under the SLM arm's 0.3 ceiling. The arm became eligible despite the task being inherently non-trivial. Once eligible, tier-0 priority then pulled it in over the CLI agents. Fix: add MinComplexityForType, applied in both ClassifyTask (heuristic path) and slm.Classifier.Classify (SLM-overlay path). The floor is per-task-type: - TaskSecurityReview, TaskOrchestration → 0.60 - TaskRefactor, TaskPlanning, TaskDebug → 0.40 - TaskUnitTest, TaskReview → 0.35 Tasks like Explain/Generation/Boilerplate keep their organic complexity score so trivial knowledge prompts (≤0.15) still fall to the SLM. Tasks that imply existing code or multi-step reasoning are clamped above the SLM's MaxComplexity, naturally routing them to a bigger arm. After both fixes, observed routing in a clean run: What is 2+2? → slm/ollama (complexity 0.015) Define a closure → slm/ollama (complexity 0.015) What is HTTP? → slm/ollama (complexity 0.015) Refactor the SLM module → subprocess/gemini (complexity 0.40) Audit for race conditions → subprocess/gemini (complexity 0.35) Plan a migration → subprocess/gemini (complexity 0.40)	2026-05-19 19:22:16 +02:00
vikingowl	0b4de6054d	feat(tui): surface SLM backend + per-turn classifier in status bar The TUI gave no indication that an SLM was configured or active. You'd see the primary provider on the status line and nothing else, even with [slm].enabled=true and a successfully booted backend. Two surfaces added: 1. Status-bar SLM badge. The left side of the status line gains a dim " · slm: <model> ⚙" suffix when the backend booted, " · slm: ✗" when it failed, and nothing when SLM is disabled. The ⚙ marker indicates the model advertises tool support. 2. Per-turn classifier visibility. The existing routing event already produced "routed → <arm> (task: <type>)" lines in the chat history; it now also reports which classifier made the decision, e.g. "routed → ollama/ministral-3:3b (task: explain, by: slm_fallback)". Lets you tell in real time whether the SLM is actually classifying or falling back to the keyword heuristic. Plumbing: - new tui.SLMInfo struct on tui.Config - main.go populates it after StartBackend returns - stream.Event gains RoutingClassifier; engine.runLoop fills it from task.ClassifierSource on the first round	2026-05-19 19:06:26 +02:00
vikingowl	a14fe8b504	feat(slm): pluggable backends + trivial-prompt routing The SLM had two intended jobs — classify every prompt and execute the small ones itself — but in practice three independent gates kept it out of nearly all real work: 1. llamafile cold-start blocked pipe-mode runs (always faster than the 15 s health check) 2. ClassifyTask defaulted RequiresTools=true, excluding the SLM arm (ToolUse=false) from 9/10 task types 3. armTier hard-coded CLI agents > local > API, so even when the SLM arm was feasible a CLI agent won Each gate is addressed below. The result is an SLM that actually does its job — small stuff stays local, complex stuff routes up — gated by arm capability rather than by accidents of the boot order. Backend layer (the bigger change) The original implementation hard-coded llamafile. That's fine if you have nothing else, but most users with a local model setup already run Ollama or llama.cpp. The new factory at internal/slm/backend.go picks between: - ollama (any local Ollama daemon) - llamacpp (any llama.cpp server) - llamafile (gnoma-managed, current behaviour) - openaicompat (LM Studio, vLLM, remote API) - auto (probes in order, picks first reachable) - disabled [slm].backend in config.toml selects which. Documented in docs/slm-backends.md with copy-paste presets for each. The factory probes the underlying model's actual capabilities (Ollama /api/show, llama.cpp /props) and sets the SLM arm's ToolUse accordingly — so the arm picks up simple file-read style tasks on tool-capable models and stays knowledge-only on completion-only models. Trivial-prompt heuristic (Gate 2) ClassifyTask now flips RequiresTools=false for short, low-complexity prompts whose task type doesn't imply existing code (Explain, Generation, Boilerplate). Tool-needing tokens (read, write, run, test, file, …) keep RequiresTools=true even when the prompt is brief. Complexity-aware tier ordering (Gate 3) armTier takes a Task and returns tier 0 for arms whose MaxComplexity ceiling fits the task. CLI agents drop to tier 1, local to 2, API to 3. For trivial tasks the SLM arm wins; for complex tasks the SLM falls out of the feasible set (MaxComplexity exclusion) and the original ordering reasserts. Eager boot with user-facing wait (Gate 1) Removed the original goroutine-only path. SLM startup now blocks synchronously inside the factory; for llamafile that means up to [slm].startup_timeout (default 5 s) of waiting on the first invocation, with "Starting SLM…" → "SLM ready (backend, model, tools, boot=N)" / "SLM unavailable: …" messages on stderr. Ollama / llamacpp backends boot instantly because the daemon is already running. waitHealthy() now respects the caller's context deadline instead of its old hardcoded 15 s ceiling. Classifier reliability Classifier timeout bumped 2 s → 5 s for thinking-mode models like Qwen3-distilled Tiny3.5. System prompt includes /no_think directive for the same family. These help but don't eliminate small-model JSON-contract failures — see the docs section on picking a model. Probe + telemetry surfaces gnoma slm status now prints the configured backend + model + a live probe result (✓/✗) instead of just the llamafile manifest state. `gnoma router stats` already (from the previous commit) shows the classifier-source mix; with this change you can finally see slm / slm_fallback / heuristic share rise from "always heuristic" to something reflecting real SLM activity. Tests - 9 new backend-factory tests (httptest-backed Ollama probe, error paths, auto-detection, capability flags) - Tier-ordering tests cover the new "specialised small arm wins trivial task" path - Trivial-prompt heuristic tested for both halves (knowledge-only flips RequiresTools=false; debug/file/run keeps it true) Deletes the dead SLMManager field from the TUI Config — it was declared but never read.	2026-05-19 18:53:32 +02:00
vikingowl	58beb7ce3c	feat(router): classifier-source telemetry + router stats command Phase 4 routing decisions depend on knowing whether the SLM classifier is actually firing or whether the heuristic is silently doing all the work. Adds the instrumentation to make that observable. router.ClassifierSource enum (heuristic / slm / slm_fallback) is set on Task by every classifier: - HeuristicClassifier → ClassifierHeuristic - slm.Classifier → ClassifierSLM on success, ClassifierSLMFallback when the SLM call fails or returns unparseable output The source is plumbed through router.Outcome to QualityTracker, which now maintains per-source counters alongside the existing per-arm × task EMA scores. QualitySnapshot serializes both (classifier_counts is omitempty for back-compat with pre-feature quality.json files). lazyClassifier logs at INFO the first time it falls back to heuristic because the SLM hasn't booted yet — distinguishes operational fallback from an unconfigured-SLM run. slm.Manager.Start() now records elapsed-to-healthy and the main.go goroutine logs it as part of the "SLM ready" event. Confirms whether short-lived runs are racing the boot cycle. New `gnoma router stats` subcommand prints both tables (arm × task quality, classifier source breakdown) from quality.json with a Phase 4 trust hint when the data is too sparse or the SLM share is low. 6 new tests cover ClassifierSource string/enum, heuristic + SLM source propagation, QualityTracker counter round-trip, and back-compat restore from a legacy quality.json without classifier_counts.	2026-05-19 18:18:22 +02:00
vikingowl	ec9433d783	chore(lint): clear remaining errcheck and staticcheck findings Brings the project to a clean `make lint` baseline (0 issues). Mechanical: - Wrap deferred resp.Body.Close() in closures (router/discovery.go, router/probe.go) so the unchecked return surfaces as `_ = ...`. - Apply `_ = ...` (single or multi-return blank) to test-file calls that intentionally ignore errors: os.MkdirAll / os.WriteFile / os.Chdir in setup paths, Close / Shutdown in teardown, Submit / Spawn / Send / LoadDir in tests that assert on side effects. Structural: - engine.handleRequestTooLarge drops the unused req parameter and rebuilds the request from compacted history (SA4009 — argument was overwritten before first use). - provider.ClassifyHTTPStatus and google.applyCapabilityOverrides switch to tagged switches over the discriminator (QF1002). - tui.app.go MouseWheel + inputMode and cmd/gnoma main slm-status use tagged switches in place of equality chains (QF1003). - cmd/gnoma main.go merges a var decl with its immediate assignment (S1021). - Three empty-branch sites (dispatcher_test, loader_test, coordinator_test) become real assertions or get the dead `if` removed (SA9003).	2026-05-19 17:53:42 +02:00
vikingowl	13b2f5e14d	chore(lint): clear dead code and tighten lifecycle errcheck Removes five unused funcs/vars/fields that golangci-lint had been flagging (anthropic.toolCallDoneEvent, mistral.translateMessages, hook.newError, subprocess.vibeParser.lastAssistantMsgID, tui.cBase), two ineffectual assignments (tui/rendering.go visible-window loop, subprocess stream_test setup), and a stale if/HasPrefix that's now a strings.TrimPrefix. Wires errcheck onto every subprocess / stream lifecycle path so a failed close or shutdown is at least logged rather than silently dropped: - engine/loop.go: stream.Close on both the error and success paths - mcp/manager.go: Shutdown when StartAll partial-fails; Transport close after Initialize failure - mcp/transport.go: stdin.Close + syscall.Kill on graceful-timeout fallback - slm/download.go: Close propagated as a named-return error on the success path; explicitly discarded on the rollback path - slm/classifier.go, slm/manager.go, hook/prompt.go, context/summarize.go, config/write.go, cmd/gnoma/main.go, tool/fs/grep.go: explicit ignores or error logging on Close / Shutdown / WalkDir / Scanln Production-code errcheck and ineffassign are now zero. Remaining golangci-lint output is test-only Close-in-defer noise plus stylistic staticcheck QF suggestions, left alone.	2026-05-19 17:05:54 +02:00
vikingowl	dc438ea181	feat(plugin): trust-on-first-use manifest pinning Plugins are now verified against ~/.config/gnoma/plugins.pins.toml at load time. Each plugin's plugin.json bytes are hashed (SHA-256) and: - recorded automatically on first load (TOFU) with a prominent warning - compared on subsequent loads - refused with a clear error if the hash drifted, without overwriting the pin so the user can review and re-enrol deliberately Pin-store I/O failures degrade to load-without-pinning rather than locking the user out of previously-trusted plugins. Closes audit finding C2. See ADR-003 for the decision rationale and docs/plugins-trust.md for the end-user trust model.	2026-05-19 16:44:09 +02:00
vikingowl	b60aa02bfd	feat(fs): enforce workspace boundary on fs tools Adds a Guard that resolves every path against an allowlist of absolute roots (default: cwd) and rejects anything escaping via relative segments, absolute paths outside the root, or symlinks (including symlinked parents on writes). Closes audit finding C1: fs.read/fs.write/fs.edit/fs.glob/fs.grep/fs.ls previously accepted any absolute path; the only protection was a substring denylist (.env, .ssh/, ...) which missed /etc/shadow, kube configs, IDE secrets, and anything reachable via symlink.	2026-05-19 16:07:29 +02:00
vikingowl	d2139c6f0c	perf+feat: parallel startup discovery + slash-command suggestion dropdown Startup: HarvestAliases, HarvestInventory, DiscoverCLIAgents, and DiscoverLocalModels now run concurrently. Worst case latency drops from sum(all) to max(all) — eliminates the 15s inventory timeout from blocking the main path. TUI: typing '/co' now shows a bordered dropdown of all matching commands with descriptions. ↑↓ navigate, Tab/Enter accepts the highlighted entry, Esc dismisses. Ghost-text still works for unique unambiguous matches.	2026-05-07 17:30:16 +02:00
vikingowl	71f31559c2	feat(cli): add 'gnoma providers' subcommand Lists configured provider, auto-discovered CLI agents (claude/gemini/vibe), running local models (ollama/llamacpp), and SLM status in one shot.	2026-05-07 17:15:46 +02:00
vikingowl	adb4f5db5d	fix(slm): start llamafile in background; use lazyClassifier Blocking Start() call (up to 15s) no longer delays TUI startup. lazyClassifier falls back to heuristic until llamafile is healthy, then atomically swaps in the SLM classifier.	2026-05-07 17:13:56 +02:00
vikingowl	9037a0d195	fix(slm): skip re-download when already set up Setup() now returns early if Status() == StatusReady. CLI also prints the existing path/size instead of starting a download.	2026-05-07 17:10:16 +02:00
vikingowl	0a1730943f	fix: provider-agnostic startup + slm setup auto-config Remove the hardcoded mistral default so gnoma starts without any provider configured. TUI mode uses a stubProvider that lets CLI agent arms (claude, gemini, etc.) handle routing; pipe mode prints a clear setup message. Also: gnoma slm setup now auto-writes the default model_url to the global config when none is set, instead of erroring.	2026-05-07 17:05:06 +02:00
vikingowl	062566a23d	fix(cli): three UX issues — help output, TUI startup, setup command - Custom flag.Usage: shows subcommands and usage patterns; -h is no longer useless - system flag default is now '' (applies built-in at runtime); flag help no longer spews the entire system prompt - API key check skips hard-exit in TUI mode; TUI starts and surfaces auth errors inline on first request instead of blocking at launch - gnoma slm setup: progress shows speed (bytes/s), no hardcoded model URL in error message, points to llamafile releases page instead	2026-05-07 16:53:57 +02:00
vikingowl	a9213ec382	feat(slm): Wave C — SLM classifier, MaxComplexity routing, CLI subcommands, TUI status - slm.Classifier: openaicompat → llamafile, 2s timeout + heuristic fallback, heuristic baseline blended so Priority/RequiredEffort are never zeroed, extractJSON strips markdown fences from small-model responses - router.ParseTaskType: case-insensitive string → TaskType, unknown → TaskGeneration - router.Arm.MaxComplexity: zero = no ceiling (preserves existing arm behavior); filterFeasible excludes arms when task.ComplexityScore > MaxComplexity - config.SLMSection: [slm] enabled / model_url / data_dir - openaicompat.NewLlamafile: no API key, model = "default", no retries - slm.Manager: DefaultDataDir() (XDG), Manifest() accessor - cmd/gnoma: `gnoma slm setup` / `gnoma slm status` subcommands; SLM arm registered with MaxComplexity=0.3 when enabled + set up - tui: /config shows slm status (ready/missing/not set up + base URL if running) - docs: roadmap updated to reflect llamafile pivot from Ollama	2026-05-07 16:44:32 +02:00
vikingowl	176926924c	feat(engine): M8 cleanup — Wave B skill enforcement - Add tool.PathSensitiveTool interface (ExtractPaths); implement on all 6 fs tools - Add engine.TurnOptions.AllowedPaths: restricts tool filesystem access per skill invocation - Bash is denied outright when AllowedPaths is active (unparseable command args) - fs tools with empty path (cwd default) resolved via os.Getwd() and validated - Add engine.TurnOptions.AllowedTools + AllowedPaths wiring in pipe mode (main.go) and TUI skill dispatch (tui/app.go) - Remove TODO(M8.3) from skill.Frontmatter — enforcement is now complete	2026-05-07 15:29:33 +02:00
vikingowl	9fb520fba6	feat(engine): M8 cleanup — Wave A wiring gaps - Remove stale TODO(P0c) comment from main.go (resolved by P0c tier routing) - Wire config.Provider.Temperature → engine.Config.Temperature → provider.Request - Add WithMaxFileSize option to fs.write; wire cfg.Tools.MaxFileSize in main.go - Wire router.ReportOutcome after each runLoop return (success = err == nil) - Fix nil-callback guard on EventRouting dispatch (pre-existing bug exposed by new test)	2026-05-07 15:22:22 +02:00
vikingowl	6883c2a041	feat(router): tier-based routing — CLI > local > API, disabled arms Adds explicit tier preference to arm selection so the router deterministically prefers lower-cost arms before falling back: tier 0: CLI agents (IsCLIAgent=true, subprocess/claude\|gemini\|vibe) tier 1: local models (IsLocal=true, ollama/llamacpp) tier 2: API providers (everything else) Within a tier, quality/cost scoring still applies. filterFeasible still gates on quality thresholds, so a low-quality local arm won't beat a high-quality API arm when the task's minimum threshold rules it out. Also adds Arm.Disabled: arms with Disabled=true are excluded from auto-routing but remain selectable via ForceArm. Implementation: armTier helper + selectBest refactored to try tiers in order, bestScored picks within a tier. router.Select skips disabled arms in allArms collection (forced arm bypasses disable check).	2026-05-07 14:36:36 +02:00
vikingowl	44d0bdc032	feat(provider): subprocess CLI provider for claude, gemini, vibe Adds internal/provider/subprocess — a provider.Provider that spawns CLI agents (claude, gemini, vibe) as subprocesses and streams their output. - FormatParser interface + three parsers for claude-stream-json, gemini-stream-json, and vibe-streaming formats; fixtures captured from real binaries - subprocessStream: pull-based stream.Stream over subprocess stdout with bounded stderr capture (8KB) and guarded reap() to prevent double-Wait - DiscoverCLIAgents: parallel PATH scan with 10s timeout, stable ordering - Provider: only the last user message is passed as --prompt; all other request fields (history, tools, system prompt) are intentionally ignored (see package doc) - main.go: discover and register CLI arms at startup; TODO(P0c) for tier-based routing to enforce preference order explicitly	2026-05-07 14:29:34 +02:00
vikingowl	d71bd942c4	feat: local model reliability — SDK retries, capability probing, init skill, context compaction Three compounding bugs prevented tool calling with llama.cpp: - Stream parser set argsComplete on partial JSON (e.g. "{"), dropping subsequent argument deltas — fix: use json.Valid to detect completeness - Missing tool_choice default — llama.cpp needs explicit "auto" to activate its GBNF grammar constraint; now set when tools are present - Tool names in history used internal format (fs.ls) while definitions used API format (fs_ls) — now re-sanitized in translateMessage Additional changes: - Disable SDK retries for local providers (500s are deterministic) - Dynamic capability probing via /props (llama.cpp) and /api/show (Ollama), replacing hardcoded model prefix list - Engine respects forced arm ToolUse capability when router is active - Bundled /init skill with Go template blocks, context-aware for local vs cloud models, deduplication rules against CLAUDE.md - Tool result compaction for local models — previous round results replaced with size markers to stay within small context windows - Text-only fallback when tool-parse errors occur on local models - "text-only" TUI indicator when model lacks tool support - Session ResetError for retry after stream failures - AllowedTools per-turn filtering in engine buildRequest	2026-04-13 02:01:01 +02:00
vikingowl	0caab0fed1	fix(router): discovery loop removes forced arm, breaking routing The discovery loop's reconcileArms removed the CLI-forced arm (llamacpp/default) because the llama.cpp server reports the real model name (e.g. gemma-26b), creating a mismatch. After 30s the forced arm disappeared and all subsequent requests failed. Three-layer fix: - Eager: query the specific provider at startup to resolve the real model name before registering the forced arm - Lazy: reconcileArms detects placeholder "default" arm names and atomically renames them when discovery reveals the real identity, with an onReconcile callback to update the session and TUI - Guard: the forced arm is never garbage-collected by the removal loop Also fixes misleading /init error messaging — failed inits now show "loaded from disk (init failed)" instead of "AGENTS.md written to".	2026-04-12 17:51:30 +02:00
vikingowl	e04cacc215	fix: append mutation, pipe-mode hang, Mistral regex false positives - Fix append footgun: allHooks/allMCPServers allocated fresh to avoid mutating cfg's backing array (lines 391/413 in main.go) - Fix pipe-mode permission prompt: detect no-TTY stdin and auto-deny instead of blocking forever on fmt.Scanln EOF - Tighten Mistral API key regex from bare [a-zA-Z0-9]{32} (matched commit hashes, UUIDs) to context-gated pattern requiring "mistral" keyword nearby. Added scanner test for positives and negatives. - Remove README demo GIF TODO placeholder - Unify version string: pass buildVersion from ldflags into tui.Config instead of hardcoding "v0.1.0-dev" - Populate benchmarks doc with actual Go benchmark results	2026-04-12 03:49:47 +02:00
vikingowl	6bb9c33d04	fix(m8): replace_default map, error UX, benchmarks, and launch prep - Fix replace_default positional bug: []string → map[string]string for explicit MCP tool → built-in name mapping - Improve error messages for missing API keys (3 actionable options) and unknown providers (early validation with available list) - Remove python3 dependency from MCP tests (pure bash grep/sed parsing) - Add router benchmark scaffold (6 benchmarks in bench_test.go + docs) - Add .goreleaser.yml for cross-platform binary releases with ldflags - Add launch-ready README with quickstart, extensibility docs, GIF placeholder - Add CONTRIBUTING.md and Gitea issue templates (bug report, feature request)	2026-04-12 03:34:58 +02:00
vikingowl	6c47f8643b	feat(m8): MCP client, tool replaceability, and plugin system Complete the remaining M8 extensibility deliverables: - MCP client with JSON-RPC 2.0 over stdio transport, protocol lifecycle (initialize/tools-list/tools-call), and process group management for clean shutdown - MCP tool adapter implementing tool.Tool with mcp__{server}__{tool} naming convention and replace_default for swapping built-in tools - MCP manager for multi-server orchestration with parallel startup, tool discovery, and registry integration - Plugin system with plugin.json manifest (name/version/capabilities), directory-based discovery (global + project scopes with precedence), loader that merges skills/hooks/MCP configs into existing registries, and install/uninstall/list lifecycle manager - Config additions: MCPServerConfig, PluginsSection with opt-in/opt-out enabled/disabled resolution - TUI /plugins command for listing installed plugins - 54 tests across internal/mcp and internal/plugin packages	2026-04-12 03:09:05 +02:00
vikingowl	48c7b7aad4	feat(skill): pipe mode support and main.go wiring	2026-04-07 02:19:42 +02:00
vikingowl	8d9c521f7a	feat: wire hook dispatcher in main.go — SessionStart, SessionEnd, PreCompact	2026-04-07 01:08:40 +02:00
vikingowl	12ace89e31	feat: interactive session picker for /resume and --resume	2026-04-06 00:22:52 +02:00
vikingowl	ae9683818b	fix: session security and correctness — path traversal, turn count restore, incognito quality leak - store: validate session ID against store root to block path traversal in Load/Save - local: seed turnCount from LocalConfig.TurnCount so resumed sessions keep correct turn count - main: pass TurnCount from snapshot to LocalConfig on resume - main: suppress quality.json save when --incognito is active - main: handle UserConfigDir error in quality save defer instead of silently using wrong path - test: add TestSessionStore_Load/Save_RejectsPathTraversal	2026-04-06 00:04:09 +02:00
vikingowl	4596ea2156	feat: wire --resume/-r CLI flags, SessionStore, quality persistence - Add --resume/-r flags; empty = list sessions, ID = restore specific session - Create SessionStore from config.ProjectRoot() and cfg.Session.MaxKeep - Wire SessionID and Store into session.NewLocal - Restore QualityTracker EMA data from ~/.config/gnoma/quality.json at startup - Persist QualityTracker data to quality.json via defer on process exit	2026-04-05 23:52:05 +02:00
vikingowl	2f60bd9f0a	feat: LocalConfig + auto-save hook in session.Local Refactor NewLocal to accept LocalConfig (matching engine/router patterns), add persistence fields (SessionID, Store, Incognito, Logger), capture finalState before releasing the lock to avoid data races, and auto-save a Snapshot after each successful turn when a store is configured. Add SessionID() to the Session interface and three new tests covering auto-save, no-store no-panic, and SessionID accessors.	2026-04-05 23:46:48 +02:00

1 2

84 Commits