gnoma

Author	SHA1	Message	Date
vikingowl	b5062d59e9	docs(readme): hero screenshot, differentiators, status, TOC Add docs/img/gnoma-tui.png as a hero image so visitors see the TUI above the fold instead of a wall of text. Pull the bandit router, prefer-policy, SLM, and built-in firewall out of buried sections into a 'What makes gnoma different' bullet list. Add a Status block flagging pre-1.0 and a table of contents. Move the pygmy-owl naming note and upstream/mirror URLs into a footer About section.	2026-05-24 15:39:14 +02:00
vikingowl	b13a6a2801	docs(plans): mark v0.3.0 plans shipped Three plans shipped end-to-end in v0.3.0; removing them from TODO.md In-flight and adding a Status: shipped header to each plan doc with the commit references. Shipped: - 2026-05-23-routing-defaults-refresh.md - 2026-05-23-prefer-routing-policy.md - 2026-05-23-startup-safety-banner.md Still in flight (telemetry-gated, fires only if measurements support it): - 2026-05-23-tool-router-specialization.md	2026-05-23 22:45:05 +02:00
vikingowl	c483656681	docs(plans): fix gnoma one-shot invocation in safety-banner plan gnoma takes the prompt as a positional argument, not via -p (that's Claude Code's syntax). Surfaced when the maintainer tried the manual smoke from the plan's "Definition of done" section and hit the "flag provided but not defined: -p" error. before: gnoma -p "test" after: gnoma "test" The same wrong syntax appears in the `f9094f6` / `3eeb5b4` commit messages but those are immutable. This commit also serves as the public record of the typo so future readers don't repeat it.	2026-05-23 22:26:56 +02:00
vikingowl	d206b3cf09	docs: routing-prefer + startup-safety user docs, plan tier-shift note README: - New "Preferring local vs cloud" subsection under "Routing defaults" — table of the three [router].prefer values, priority order against forced arm / incognito / Strengths, and the CLI-agent-counts-as-local clarification. - New "Startup safety check" subsection under "Security" — tier table, [safety] config block, --dangerously-allow-anywhere flag, container detection note, link to the plan doc. Plan doc (prefer-routing-policy): - Approach section updated to describe the tier-shift mechanism that actually shipped, with a clear "Implementation note" explaining why the original score-multiplier approach was abandoned (cost-floor math gives local arms a ~280x raw-score advantage that any reasonable multiplier can't overcome). - CLI-agent placement flipped from "non-local" to "local" with rationale — implementation chose user-facing behavior axis over the privacy axis the original draft used. - Tier-shift rationale table replacing the multiplier rationale. - P-3 task rewritten to reflect the actual implementation (checked off and pointing at the right code), with the policyMultiplier helper noted as a within-tier nudge of limited present effect. The implementation-vs-plan deviation is now documented in both the plan doc and the original feature commit message (`f9094f6`). Future readers reach the same understanding via either path.	2026-05-23 22:23:57 +02:00
vikingowl	162c8b1017	docs(plans): prefer-routing-policy and startup-safety-banner Two parallel pre-flight plans surfaced in the 2026-05-23 session, both deferred while the routing-defaults-refresh implementation landed. Drafted as separate plans because they're independent: the prefer-policy is a router scoring change; the safety banner is a launch-time check that never touches the router. prefer-routing-policy [router].prefer = "local" \| "cloud" \| "auto" — soft score multiplier (0.3 / 0.5 / 1.0) biasing toward local or cloud arms while preserving Strengths cross-tier promotion and bandit learning. Default "auto" is byte-identical to current behavior. Forced arms and incognito retain priority. CLI-agent subprocess arms count as non-local for this knob (they proxy to cloud). startup-safety-banner Three-tier cwd classification at launch — refuse in /etc /sys and other system roots; warn+keypress in $HOME, /tmp, ~/Desktop, ~/Downloads; OK inside any git repo or directory with a project marker (.gnoma/, go.mod, package.json, etc.). Always shows a context banner with cwd, git state, model, modes, and a top-level sensitive-file inventory (.env, id_rsa, *.pem, .ssh/, etc. — informational only, no recursion, capped at 1000 entries). Bypass via --dangerously-allow-anywhere. Complements the in-flight sensitive-content unified-policy TODO item: this is the pre-flight layer, that is the runtime input-path layer. Both plans default-on with safe defaults; both have explicit out-of-scope sections to prevent scope creep during implementation. Linux + macOS first; Windows path classification deferred. TODO.md surfaces both as in-flight.	2026-05-23 22:00:21 +02:00
vikingowl	a79e99199d	feat(router): non-chat exclude, vision prefixes, family-defaults scaffold Discovery previously registered every model returned by Ollama as a chat arm, including embeddings, ASR, TTS, audio realtime, and rerankers — which then failed at inference time when the router selected them. Local arms also shipped with all-zero defaults, so selection between e.g. tiny3.5:1.5b, phi-4:14b, and qwen3-coder:30b was effectively random. This change covers tasks R-1, R-2, R-6 from the routing-defaults plan. - nonChatModelPatterns + isNonChatModel substring matcher; matched IDs are skipped during RegisterDiscoveredModels. Covers whisper, moonshine, kokoros, vibevoice, -asr, -tts, -audio, -embedding, embeddinggemma, -reranker, lfm2. - knownVisionModelPrefixes gains gemma4, gemma-4, glm-ocr. gemma3 and minicpm-v entries stay for regression coverage. - New internal/router/defaults.go with FamilyDefaults struct, knownFamilyDefaults map, and ResolveFamilyDefaults longest-prefix lookup (with org/-namespace stripping so reecdev/tiny3.5:1.5b resolves to "tiny3.5"). Single entry for now: functiongemma is registered with Disabled=true and MaxComplexity=0.40, reserved for the future ArmRoleToolRouter path. Table will grow in R-3. - RegisterDiscoveredModels consults ResolveFamilyDefaults and only populates fields that are still zero on the arm, so user [[arms]] overrides keep priority. Plans: - docs/superpowers/plans/2026-05-23-routing-defaults-refresh.md - docs/superpowers/plans/2026-05-23-tool-router-specialization.md TODO.md surfaces both as in-flight items.	2026-05-23 21:24:59 +02:00
vikingowl	49d80cf847	feat(security): format-aware entropy safelist (Phase F-1) Add a deterministic pre-extractor that skips known-safe token shapes before they reach the entropy scorer. Targets the false-positive regime that bites under lowered entropy_threshold or redact_high_entropy = true — UUIDs (~3.4 bits), SHA hex digests (~3.9 bits), ISO-8601 timestamps, and HTTP(S) URLs. Config knob lives under the existing security section to match entropy_threshold / redact_high_entropy convention: [security] entropy_safelist = ["uuid", "sha_hex", "iso8601", "url"] Empty / unset preserves pre-F-1 behaviour exactly — users opt in. Per-pattern Debug telemetry fires on every skip (pattern name + token length, never the token bytes). This is the data F-2's go/no-go gate depends on; the plan literally specifies it. NewFirewall validates names at the config boundary and emits a Warn for unknown entries so a typo like "uid" instead of "uuid" surfaces loudly instead of silently disabling FP reduction. Tests cover: UUID/SHA-1/SHA-256 skipped at lowered threshold, mixed payload (safe shape + real secret) preserves the secret, secret-adjacent-to-UUID regression guard, empty safelist preserves pre-F-1 behaviour, unknown name silently dropped at scanner level but warned at firewall level, end-to-end FirewallConfig wiring, and the skip-telemetry log line. F-2 remains gated on real-workload FP-rate observations.	2026-05-22 12:39:10 +02:00
vikingowl	7d0e35b0f4	docs: record Phase F external validation, surface in active TODOs	2026-05-20 19:15:49 +02:00
vikingowl	8d6e66533b	docs(plans): add Phase F entropy FP reduction to post-SLM plan	2026-05-20 10:06:43 +02:00
vikingowl	5170c73dac	docs: refresh README/CONTRIBUTING/AGENTS/TODO, add LICENSE, drop obsolete files Top-level docs were stale and the .gitea/ issue templates referenced a workflow that is no longer in use. - README: rewrite around the current feature set (SLM routing, profiles, plugin TOFU, SafeProvider boundary, current model defaults). Add a pre-built-binary install section plus Docker (ghcr.io) install path for users without a Go toolchain. Document the GitHub mirror. - CONTRIBUTING: drop the dead issue-template reference, note Gitea upstream + GitHub mirror split, expand the package map and test-target table. - AGENTS: rebuild as a domain glossary (Elf / Arm / Turn / SafeProvider / Incognito / Profile) plus non-obvious conventions an outside agent needs and would not infer from the code. - TODO: trim completed waves into a History section, fix a broken link to the never-written Wave 3 plan file, surface active backlog. - docs/essentials/INDEX: add ADR-004 (PostToolUse hook ordering) to the ADR list. - LICENSE + NOTICE: adopt Apache License 2.0. Patent grant matters because gnoma bundles SDKs from Anthropic / OpenAI / Google / Mistral and ships derivative tooling that runs untrusted MCP servers. - Delete .gitea/issue_template/ and gemma-integration-analysis.md (latter is obsolete per its own preamble — Node.js-specific notes that don't apply to the Go implementation).	2026-05-20 03:13:40 +02:00
vikingowl	129d4f1ea6	chore: remove TinyLlama and set tiny3.5 (Qwen2.5 0.5B) as default SLM	2026-05-20 00:26:58 +02:00
vikingowl	f8c85a26e9	docs(security): ADR-004 PostToolUse hook ordering + invariant test Closes the last remaining 2026-05-19 audit finding by documenting the existing transitive guarantee rather than restructuring the hook contract. The audit observed that PostToolUse hooks receive raw tool output before the firewall scan runs, and proposed reordering or splitting the event into raw-local-only and redacted-for-LLM variants. After Wave 1 (SafeProvider boundary at every router arm + non-engine provider consumer), the audit's threat model is closed transitively: - Shell hooks see raw output but never reach an LLM. - Prompt hooks route Stream calls through routerStreamer → router → arm.Provider, every arm.Provider is now *SafeProvider, outgoing messages are scanned at the boundary. - Agent hooks spawn an elf whose engine has Firewall set; buildRequest scans inline. Reordering would regress legitimate shell-hook use cases (audit, forensic, local alert) that need raw access. Splitting the contract forces every existing hook config to migrate and introduces a wrong-variant footgun. Neither is justified by the residual risk. Three changes ship with the ADR: - ADR-004 records the decision and the conditions for re-opening it. - Doc comments on hook.PostToolUse and the dispatcher call site in the engine point at the ADR. - internal/hook/posttooluse_redaction_test.go locks in the invariant: a prompt PostToolUse hook firing on a secret-bearing tool result produces a redacted prompt at the inner provider. If this test fails, ADR-004's Position A is no longer correct and the audit finding re-opens.	2026-05-19 23:28:25 +02:00
vikingowl	3ae40083f1	docs(security): Wave 2 plan — incognito coherence Plan for the second hardening wave. Six findings closed in one PR: W2-1 router rejects forced non-local under local-only; W2-2 persist store consults IncognitoMode + 0o600/0o700 perms; W2-3 TUI seeds incognito from firewall; W2-4 quality/outcome gates read firewall instead of CLI flag; W2-5 session perms 0o600; W2-6 remove dead IncognitoMode.LocalOnly field.	2026-05-19 22:44:20 +02:00
vikingowl	8dcca64e41	feat(security): add SafeProvider boundary wrapper (W1-1) Introduces internal/security/SafeProvider — a provider.Provider decorator that scans outgoing messages and the system prompt through the firewall before delegating to the inner provider. Tool-result redaction stays in the engine because it needs per-tool context the boundary lacks. FirewallRef provides a late-binding atomic.Pointer[Firewall] so the wrapper can be installed before NewFirewall runs in main. A nil or unset ref makes SafeProvider a pass-through — preserves the current init order without lock contention or panics. Wave 1 of the post-audit hardening plan (docs/superpowers/plans/2026-05-19-security-wave1-safeprovider.md). Closes the architectural critique that secret scanning only ran inside engine.buildRequest(), leaving SLM/summarizer/hook/routerStreamer paths to send raw payloads. This commit only ships the wrapper; W1-2 and W1-3 will wire it through main and the four bypass sites.	2026-05-19 22:28:46 +02:00
vikingowl	d84b295da2	feat(tui): /profile slash command + status-bar profile badge (Phase C-3) Adds the in-TUI surface for the profile system: - Status bar carries " · profile: <name>" next to the SLM badge when profile mode is engaged (renders nothing in legacy single-config installations). - /profile (no args) shows the active profile and lists available ones. - /profile <name> switches by re-executing gnoma via syscall.Exec under --profile <name>. Critical cleanups (quality.json snapshot, SLM backend Close, session.Close) fire explicitly before exec since defers don't run after exec replaces the process image. Using syscall.Exec rather than a child process avoids stacking a process level on every switch and propagates the new gnoma's exit code directly to the shell. - Autocomplete after "/profile " offers configured profile names; the completion source is threaded from main.go via tui.Config. Conversation history is not preserved across a switch — profile change implies different context, different keys, different permission mode, so a clean reset is the correct semantic.	2026-05-19 21:59:11 +02:00
vikingowl	8450005b31	feat(cli): gnoma profile list/show subcommands (Phase C-2) `profile list` enumerates configured profiles and marks default + active. `profile show <name>` prints the merged effective config the profile would produce — sections, configured key names (values never), CLI agent overrides, arms, hooks, MCP servers, per-profile quality and session paths. Both commands work as a recovery affordance when profile resolution is broken: list flags a missing-default explicitly with "<name> (default, missing)", and the dispatcher falls back to a base-only load (new gnomacfg.LoadBase) so the diagnostics still run. API key values are filtered out of `profile show` — the output is safe to paste in a help channel or attach to a bug report.	2026-05-19 21:44:50 +02:00
vikingowl	635dad660c	feat(config): per-profile config layering with --profile flag (Phase C-1) Adds opt-in user profiles for swapping API keys, CLI binaries, and permission modes between contexts (work/private/experiment/...). Profile mode engages only when ~/.config/gnoma/profiles/ exists, so existing single-config installations are untouched. Selection order: --profile flag → default_profile in base config → fatal error. Layering: defaults → ~/.config/gnoma/config.toml → profiles/<name>.toml → <projectRoot>/.gnoma/config.toml → env. Map sections merge per-key; [[arms]] and [[mcp_servers]] merge by id/name; [[hooks]] appends. Per-profile data: quality-<name>.json and sessions/<name>/ keep the bandit and session list from cross-contaminating between profiles. Profile names restricted to [A-Za-z0-9_-] to block --profile=../foo path traversal into derived paths.	2026-05-19 21:35:33 +02:00
vikingowl	0aabd19906	feat(router): per-arm strengths + cost weight (Phase D) Plan D from docs/superpowers/plans/2026-05-19-post-slm-unlock.md (static portion; dynamic bandit-driven promotion deferred to D-2). Routing previously let tier ordering (CLI > local > API) dominate selection — Opus, in tier 3, would lose to a tier-1 CLI agent for SecurityReview even though Opus is empirically stronger at that task. This change introduces explicit per-arm overrides: [[arms]] id = "anthropic/claude-opus-4-7" strengths = ["security_review", "planning"] cost_weight = 0.3 Strengths gate cross-tier promotion: arms matching task.Type bypass the tier loop and compete with each other directly. Promotion is a preference, not a pin — if no strength-tagged arm is feasible (backoff, pool capacity, tool support), selection falls through to the default tier order. CostWeight linearly dampens the cost penalty in scoreArm via effectiveCost = 1 + CostWeight * (cost - 1) CostWeight=1.0 (or unset) preserves current behavior; lower values trade cheapness for quality. The earlier draft used cost^CostWeight which inverts direction for sub-1 local-arm costs (raising a fraction <1 to a fractional power makes it bigger, not smaller); a monotonicity regression test prevents that drift. - internal/router/arm.go: Strengths []TaskType, CostWeight float64, HasStrength(), ResolvedCostWeight() (zero → 1.0). - internal/router/selector.go: scoreArm strength bonus const (strengthScoreBonus = 0.15) + linear cost dampening; selectBest cross-tier promotion before tier loop. - internal/router/router.go: ArmOverride type + ApplyArmOverrides() returns unknown IDs; unknown strength names skipped with per-name warning via slog. - internal/router/task.go: ParseTaskTypeStrict() returns ok bool; ParseTaskType now delegates so the two switches stay in sync. - internal/config/config.go: ArmConfig + [[arms]] TOML wiring. - cmd/gnoma/main.go: applies overrides after all initial arms register; logs a warning when an [[arms]] id has no matching registered arm. Tests cover: predicate helpers, scoring direction across two arms, linear-formula monotonicity on both sides of cost=1, cross-tier promotion, empty-Strengths preserves tier order, promoted arm in backoff falls through via full Router.Select path, observed-quality tiebreak between two strength-tagged arms, ApplyArmOverrides happy path + unknown-ID reporting + unknown-strength skipping.	2026-05-19 21:14:45 +02:00
vikingowl	b331dcd61a	feat(subprocess): per-agent binary override via [cli_agents] config Plan B from docs/superpowers/plans/2026-05-19-post-slm-unlock.md. Users with aliased CLI binaries (claude-priv, claude-work, gemini-personal) can now point gnoma's auto-discovery at them without renaming. The override flows through to the actual subprocess spawn at internal/provider/subprocess/provider.go:56, so routing through the alias is functional, not cosmetic. Config: [cli_agents] claude = "claude-priv" # discovery uses claude-priv instead of claude gemini = "" # empty value = no override (fall back to canonical) # vibe is absent = canonical name used - internal/config/config.go: CLIAgentsSection map[string]string; TOML [cli_agents] key. - internal/provider/subprocess/agent.go: - Package-level lookPath = exec.LookPath for test injection. - resolveAgentBinary(canonical, override) → (path, binName, err). Override='' falls back to canonical. Override set but missing from PATH returns an error (no silent fallback — masks user typos). - DiscoveredAgent.OverrideBinary records the override binary name when one was used; empty otherwise. - DiscoverCLIAgents(ctx, overrides) signature; warning logged when an override is configured but the binary isn't on PATH. - cmd/gnoma/main.go: both call sites pass cfg.CLIAgents. The `gnoma providers` listing renders `claude-priv (via [cli_agents].claude)` when an override is in effect. Tests cover: 5 resolver cases (no override, override set, empty override falls back, override missing, canonical missing); 4 discovery cases (no overrides, override resolves alias, empty value falls back, override missing skips agent); 2 config round-trip cases.	2026-05-19 21:02:16 +02:00
vikingowl	43ea2e562d	feat(engine): two-stage tool routing for small local arms Plan A from docs/superpowers/plans/2026-05-19-post-slm-unlock.md. Small local SLMs (<=16k context) waste ~1500 tokens per turn on the full tool catalogue. Two-stage routing replaces round-1 tools with a single synthetic select_category schema; round-2+ sends only the selected category's real tool schemas plus select_category for re-selection. - internal/tool/category.go: Category type, optional Categorized interface, CategoryOf() with meta fallback. fs.read/fs.ls -> read, fs.write/fs.edit -> write, fs.glob/fs.grep -> search, bash -> exec. - internal/engine/twostage.go: synthetic select_category tool, intercept helper, per-turn selectedCategory state under e.mu. - Engine round 1 forces ToolChoiceRequired so SLMs don't fall back to prose. State resets at the top and end of every runLoop. - Activates automatically on a forced local arm with ContextWindow <=16384, or via [router].force_two_stage TOML key. - Integration test drives a 3-round trip and asserts: round 1 emits exactly one schema (synthetic) with ToolChoiceRequired, round 2 contains only write-category schemas + select_category, real fs.write executes. Invalid-category fallback round-trips back to round-1 mode.	2026-05-19 20:53:21 +02:00
vikingowl	21da29e73e	docs(plan): capture post-SLM-unlock outstanding work New dated plan at docs/superpowers/plans/2026-05-19-post-slm-unlock.md covers the work surfaced during this session that hasn't shipped yet: Phase A — two-stage tool routing (last item from the original smallcode audit; gates on local + small-context arms; saves ~70% of schema tokens per request). Phase B — CLI agent binary override. [cli_agents] config section lets users map canonical agent names (claude / gemini / vibe) onto local aliases (claude-priv, gemini-work, etc.). Phase C — user profiles. Multiple named configs (work / private / experiment) layered over a base config.toml, switchable via --profile flag, [config].default_profile, and a /profile TUI command. Phase D — per-arm capability tags (Phase-4 prep). Per-arm Strengths []TaskType and CostWeight to make the router actually pick Opus over Gemini for Planning/SecurityReview etc., not just for cost reasons. Phase E — compound tools (deferred until SLM-arm telemetry shows which chain patterns fail). Plus an explicit drop list of things we considered and won't ship. TODO.md updated to point at the new plan and note that the original roadmap's Phase 4 is now superseded.	2026-05-19 19:31:40 +02:00
vikingowl	a14fe8b504	feat(slm): pluggable backends + trivial-prompt routing The SLM had two intended jobs — classify every prompt and execute the small ones itself — but in practice three independent gates kept it out of nearly all real work: 1. llamafile cold-start blocked pipe-mode runs (always faster than the 15 s health check) 2. ClassifyTask defaulted RequiresTools=true, excluding the SLM arm (ToolUse=false) from 9/10 task types 3. armTier hard-coded CLI agents > local > API, so even when the SLM arm was feasible a CLI agent won Each gate is addressed below. The result is an SLM that actually does its job — small stuff stays local, complex stuff routes up — gated by arm capability rather than by accidents of the boot order. Backend layer (the bigger change) The original implementation hard-coded llamafile. That's fine if you have nothing else, but most users with a local model setup already run Ollama or llama.cpp. The new factory at internal/slm/backend.go picks between: - ollama (any local Ollama daemon) - llamacpp (any llama.cpp server) - llamafile (gnoma-managed, current behaviour) - openaicompat (LM Studio, vLLM, remote API) - auto (probes in order, picks first reachable) - disabled [slm].backend in config.toml selects which. Documented in docs/slm-backends.md with copy-paste presets for each. The factory probes the underlying model's actual capabilities (Ollama /api/show, llama.cpp /props) and sets the SLM arm's ToolUse accordingly — so the arm picks up simple file-read style tasks on tool-capable models and stays knowledge-only on completion-only models. Trivial-prompt heuristic (Gate 2) ClassifyTask now flips RequiresTools=false for short, low-complexity prompts whose task type doesn't imply existing code (Explain, Generation, Boilerplate). Tool-needing tokens (read, write, run, test, file, …) keep RequiresTools=true even when the prompt is brief. Complexity-aware tier ordering (Gate 3) armTier takes a Task and returns tier 0 for arms whose MaxComplexity ceiling fits the task. CLI agents drop to tier 1, local to 2, API to 3. For trivial tasks the SLM arm wins; for complex tasks the SLM falls out of the feasible set (MaxComplexity exclusion) and the original ordering reasserts. Eager boot with user-facing wait (Gate 1) Removed the original goroutine-only path. SLM startup now blocks synchronously inside the factory; for llamafile that means up to [slm].startup_timeout (default 5 s) of waiting on the first invocation, with "Starting SLM…" → "SLM ready (backend, model, tools, boot=N)" / "SLM unavailable: …" messages on stderr. Ollama / llamacpp backends boot instantly because the daemon is already running. waitHealthy() now respects the caller's context deadline instead of its old hardcoded 15 s ceiling. Classifier reliability Classifier timeout bumped 2 s → 5 s for thinking-mode models like Qwen3-distilled Tiny3.5. System prompt includes /no_think directive for the same family. These help but don't eliminate small-model JSON-contract failures — see the docs section on picking a model. Probe + telemetry surfaces gnoma slm status now prints the configured backend + model + a live probe result (✓/✗) instead of just the llamafile manifest state. `gnoma router stats` already (from the previous commit) shows the classifier-source mix; with this change you can finally see slm / slm_fallback / heuristic share rise from "always heuristic" to something reflecting real SLM activity. Tests - 9 new backend-factory tests (httptest-backed Ollama probe, error paths, auto-detection, capability flags) - Tier-ordering tests cover the new "specialised small arm wins trivial task" path - Trivial-prompt heuristic tested for both halves (knowledge-only flips RequiresTools=false; debug/file/run keeps it true) Deletes the dead SLMManager field from the TUI Config — it was declared but never read.	2026-05-19 18:53:32 +02:00
vikingowl	dc438ea181	feat(plugin): trust-on-first-use manifest pinning Plugins are now verified against ~/.config/gnoma/plugins.pins.toml at load time. Each plugin's plugin.json bytes are hashed (SHA-256) and: - recorded automatically on first load (TOFU) with a prominent warning - compared on subsequent loads - refused with a clear error if the hash drifted, without overwriting the pin so the user can review and re-enrol deliberately Pin-store I/O failures degrade to load-without-pinning rather than locking the user out of previously-trusted plugins. Closes audit finding C2. See ADR-003 for the decision rationale and docs/plugins-trust.md for the end-user trust model.	2026-05-19 16:44:09 +02:00
vikingowl	a9213ec382	feat(slm): Wave C — SLM classifier, MaxComplexity routing, CLI subcommands, TUI status - slm.Classifier: openaicompat → llamafile, 2s timeout + heuristic fallback, heuristic baseline blended so Priority/RequiredEffort are never zeroed, extractJSON strips markdown fences from small-model responses - router.ParseTaskType: case-insensitive string → TaskType, unknown → TaskGeneration - router.Arm.MaxComplexity: zero = no ceiling (preserves existing arm behavior); filterFeasible excludes arms when task.ComplexityScore > MaxComplexity - config.SLMSection: [slm] enabled / model_url / data_dir - openaicompat.NewLlamafile: no API key, model = "default", no retries - slm.Manager: DefaultDataDir() (XDG), Manifest() accessor - cmd/gnoma: `gnoma slm setup` / `gnoma slm status` subcommands; SLM arm registered with MaxComplexity=0.3 when enabled + set up - tui: /config shows slm status (ready/missing/not set up + base URL if running) - docs: roadmap updated to reflect llamafile pivot from Ollama	2026-05-07 16:44:32 +02:00
vikingowl	5569d4fb86	docs: consolidated roadmap, ADR-013, drop stale plans - New 7-phase roadmap (2026-05-07-gnoma-roadmap.md) covering M8 cleanup, PTY interactive shell, SLM classifier, router revisit, USP security, ELF support, and distribution - ADR-013 (002-slm-routing.md): SLM-first routing supersedes ADR-009; Thompson Sampling deferred pending SLM production data - ADR-009 status updated to "Superseded by ADR-013" - gemma-integration-analysis.md: header note that Node.js specifics (LiteRT-LM, daemon, PID) don't apply to gnoma's Go implementation - TODO.md replaced with thin pointer to roadmap + stable backlog - Deleted stale plan/spec files: m6-m7-closeout, m8-hooks-design	2026-05-07 15:06:54 +02:00
vikingowl	e04cacc215	fix: append mutation, pipe-mode hang, Mistral regex false positives - Fix append footgun: allHooks/allMCPServers allocated fresh to avoid mutating cfg's backing array (lines 391/413 in main.go) - Fix pipe-mode permission prompt: detect no-TTY stdin and auto-deny instead of blocking forever on fmt.Scanln EOF - Tighten Mistral API key regex from bare [a-zA-Z0-9]{32} (matched commit hashes, UUIDs) to context-gated pattern requiring "mistral" keyword nearby. Added scanner test for positives and negatives. - Remove README demo GIF TODO placeholder - Unify version string: pass buildVersion from ldflags into tui.Config instead of hardcoding "v0.1.0-dev" - Populate benchmarks doc with actual Go benchmark results	2026-04-12 03:49:47 +02:00
vikingowl	6bb9c33d04	fix(m8): replace_default map, error UX, benchmarks, and launch prep - Fix replace_default positional bug: []string → map[string]string for explicit MCP tool → built-in name mapping - Improve error messages for missing API keys (3 actionable options) and unknown providers (early validation with available list) - Remove python3 dependency from MCP tests (pure bash grep/sed parsing) - Add router benchmark scaffold (6 benchmarks in bench_test.go + docs) - Add .goreleaser.yml for cross-platform binary releases with ldflags - Add launch-ready README with quickstart, extensibility docs, GIF placeholder - Add CONTRIBUTING.md and Gitea issue templates (bug report, feature request)	2026-04-12 03:34:58 +02:00
vikingowl	6c47f8643b	feat(m8): MCP client, tool replaceability, and plugin system Complete the remaining M8 extensibility deliverables: - MCP client with JSON-RPC 2.0 over stdio transport, protocol lifecycle (initialize/tools-list/tools-call), and process group management for clean shutdown - MCP tool adapter implementing tool.Tool with mcp__{server}__{tool} naming convention and replace_default for swapping built-in tools - MCP manager for multi-server orchestration with parallel startup, tool discovery, and registry integration - Plugin system with plugin.json manifest (name/version/capabilities), directory-based discovery (global + project scopes with precedence), loader that merges skills/hooks/MCP configs into existing registries, and install/uninstall/list lifecycle manager - Config additions: MCPServerConfig, PluginsSection with opt-in/opt-out enabled/disabled resolution - TUI /plugins command for listing installed plugins - 54 tests across internal/mcp and internal/plugin packages	2026-04-12 03:09:05 +02:00
vikingowl	8d97c6cd39	docs: mark M8.2 skill system deliverables complete in milestones.md	2026-04-07 02:25:29 +02:00
vikingowl	24f4a739a6	docs: mark M8.1 hook system deliverables complete in milestones.md	2026-04-07 01:09:07 +02:00
vikingowl	fef38b3502	docs: M8.1 hook system design spec	2026-04-06 02:42:34 +02:00
vikingowl	2c0ff5ff1f	docs: mark M7 deliverables complete in milestones.md	2026-04-06 00:59:16 +02:00
vikingowl	43dcc7e9de	docs: M6/M7 close-out implementation plan — 8 tasks, TDD, full file map	2026-04-05 21:33:42 +02:00
vikingowl	252ffde732	docs: M6/M7 close-out design spec — tool persistence, tokenizer, router feedback, coordinator	2026-04-05 21:22:26 +02:00
vikingowl	abb3e3ca90	feat: spawn_elfs batch tool for guaranteed parallel elf execution New spawn_elfs tool takes array of tasks, spawns all elfs simultaneously. Solves the problem of models (Mistral Small, Devstral) that serialize tool calls instead of batching them. Schema: {"tasks": [{"prompt": "...", "task_type": "..."}], "max_turns": 30} Also: - Suppress spawn_elfs tool output from chat (tree handles display) - Update M7 milestones to reflect completed deliverables - Add CC-inspired features to M8/M10: task notification system, task framework, /batch skill, coordinator mode, StreamingToolExecutor, git worktree isolation	2026-04-03 21:03:51 +02:00
vikingowl	8e5ddb20cb	feat: hybrid system inventory — dynamic PATH scan + runtime probing No hardcoded tool lists. Scans all $PATH directories for executables (5541 on this system), then probes known runtime patterns for version info (23 detected: Go, Python, Node, Rust, Ruby, Perl, Java, Dart, Deno, Bun, Lua, LuaJIT, Guile, GCC, Clang, NASM + package managers). System prompt includes: OS, shell, runtime versions, and notable tools (git, docker, kubectl, fzf, rg, etc.) from the full PATH scan. Total executable count reported so the LLM knows the full scope. Milestones updated: M6 fixed context prefix, M12 multimodality.	2026-04-03 14:36:22 +02:00
vikingowl	c54471a37b	refactor: migrate mistral sdk to github.com/VikingOwl91/mistral-go-sdk Same package, new GitHub deployment with fixed tests. somegit.dev/vikingowl → github.com/VikingOwl91, v1.2.0 → v1.2.1	2026-04-03 12:06:59 +02:00
vikingowl	69f5dba091	feat: complete M1 — core engine with Mistral provider Mistral provider adapter with streaming, tool calls (single-chunk pattern), stop reason inference, model listing, capabilities, and JSON output support. Tool system: bash (7 security checks, shell alias harvesting for bash/zsh/fish), file ops (read, write, edit, glob, grep, ls). Alias harvesting collects 300+ aliases from user's shell config. Engine agentic loop: stream → tool execution → re-query → until done. Tool gating on model capabilities. Max turns safety limit. CLI pipe mode: echo "prompt" \| gnoma streams response to stdout. Flags: --provider, --model, --system, --api-key, --max-turns, --verbose, --version. Provider interface expanded: Models(), DefaultModel(), Capabilities (ToolUse, JSONOutput, Vision, Thinking, ContextWindow, MaxOutput), ResponseFormat with JSON schema support. Live verified: text streaming + tool calling with devstral-small. 117 tests across 8 packages, 10MB binary.	2026-04-03 12:01:55 +02:00
vikingowl	951ab3b970	docs: update essentials for router, security, task learning Restructure milestones from M1-M11 to M1-M15: - M3: Security Firewall (secret scanner, incognito mode) - M4: Router Foundation (arm registry, pools, task classifier) - M5: TUI with full 6 permission modes - M6: Full compaction (truncate + LLM summarization) - M9: Router Advanced (bandit learning, ensemble strategies) - M11: Task Learning (pattern detection, persistent tasks) Add ADR-007 through ADR-012 for security-as-core, router split, Thompson Sampling, MCP replaceability, task learning, incognito. Add risks R-010 through R-015 for router, security, feedback, task learning, ensemble quality, shell parser. Update architecture dependency graph with security, router, elf, hook, skill, mcp, plugin, tasklearn packages. Update domain model with Router, Arm, LimitPool, Firewall entities.	2026-04-03 10:47:11 +02:00
vikingowl	154d978564	docs: add project essentials (12/12 complete) Vision, domain model, architecture, patterns, process flows, UML diagrams, API contracts, tech stack, constraints, milestones (M1-M11), decision log (6 ADRs), and risk register. Key decisions: single binary, pull-based streaming, Mistral as M1 reference provider, discriminated unions, multi-provider collaboration as core identity.	2026-04-02 18:09:07 +02:00

40 Commits