Commit Graph

21 Commits

Author SHA1 Message Date
vikingowl 43ea2e562d feat(engine): two-stage tool routing for small local arms
Plan A from docs/superpowers/plans/2026-05-19-post-slm-unlock.md.

Small local SLMs (<=16k context) waste ~1500 tokens per turn on the
full tool catalogue. Two-stage routing replaces round-1 tools with a
single synthetic select_category schema; round-2+ sends only the
selected category's real tool schemas plus select_category for
re-selection.

- internal/tool/category.go: Category type, optional Categorized
  interface, CategoryOf() with meta fallback. fs.read/fs.ls -> read,
  fs.write/fs.edit -> write, fs.glob/fs.grep -> search, bash -> exec.
- internal/engine/twostage.go: synthetic select_category tool,
  intercept helper, per-turn selectedCategory state under e.mu.
- Engine round 1 forces ToolChoiceRequired so SLMs don't fall back to
  prose. State resets at the top and end of every runLoop.
- Activates automatically on a forced local arm with ContextWindow
  <=16384, or via [router].force_two_stage TOML key.
- Integration test drives a 3-round trip and asserts: round 1 emits
  exactly one schema (synthetic) with ToolChoiceRequired, round 2
  contains only write-category schemas + select_category, real
  fs.write executes. Invalid-category fallback round-trips back to
  round-1 mode.
2026-05-19 20:53:21 +02:00
vikingowl 21da29e73e docs(plan): capture post-SLM-unlock outstanding work
New dated plan at docs/superpowers/plans/2026-05-19-post-slm-unlock.md
covers the work surfaced during this session that hasn't shipped yet:

Phase A — two-stage tool routing (last item from the original
smallcode audit; gates on local + small-context arms; saves ~70% of
schema tokens per request).

Phase B — CLI agent binary override. [cli_agents] config section lets
users map canonical agent names (claude / gemini / vibe) onto local
aliases (claude-priv, gemini-work, etc.).

Phase C — user profiles. Multiple named configs (work / private /
experiment) layered over a base config.toml, switchable via
--profile flag, [config].default_profile, and a /profile TUI command.

Phase D — per-arm capability tags (Phase-4 prep). Per-arm Strengths
[]TaskType and CostWeight to make the router actually pick Opus over
Gemini for Planning/SecurityReview etc., not just for cost reasons.

Phase E — compound tools (deferred until SLM-arm telemetry shows
which chain patterns fail).

Plus an explicit drop list of things we considered and won't ship.
TODO.md updated to point at the new plan and note that the original
roadmap's Phase 4 is now superseded.
2026-05-19 19:31:40 +02:00
vikingowl a14fe8b504 feat(slm): pluggable backends + trivial-prompt routing
The SLM had two intended jobs — classify every prompt and execute the
small ones itself — but in practice three independent gates kept it
out of nearly all real work:

  1. llamafile cold-start blocked pipe-mode runs (always faster than
     the 15 s health check)
  2. ClassifyTask defaulted RequiresTools=true, excluding the SLM arm
     (ToolUse=false) from 9/10 task types
  3. armTier hard-coded CLI agents > local > API, so even when the SLM
     arm was feasible a CLI agent won

Each gate is addressed below. The result is an SLM that actually does
its job — small stuff stays local, complex stuff routes up — gated by
arm capability rather than by accidents of the boot order.

Backend layer (the bigger change)

The original implementation hard-coded llamafile. That's fine if you
have nothing else, but most users with a local model setup already run
Ollama or llama.cpp. The new factory at internal/slm/backend.go picks
between:

  - ollama (any local Ollama daemon)
  - llamacpp (any llama.cpp server)
  - llamafile (gnoma-managed, current behaviour)
  - openaicompat (LM Studio, vLLM, remote API)
  - auto (probes in order, picks first reachable)
  - disabled

[slm].backend in config.toml selects which. Documented in
docs/slm-backends.md with copy-paste presets for each. The factory
probes the underlying model's actual capabilities (Ollama /api/show,
llama.cpp /props) and sets the SLM arm's ToolUse accordingly — so the
arm picks up simple file-read style tasks on tool-capable models and
stays knowledge-only on completion-only models.

Trivial-prompt heuristic (Gate 2)

ClassifyTask now flips RequiresTools=false for short, low-complexity
prompts whose task type doesn't imply existing code (Explain,
Generation, Boilerplate). Tool-needing tokens (read, write, run, test,
file, …) keep RequiresTools=true even when the prompt is brief.

Complexity-aware tier ordering (Gate 3)

armTier takes a Task and returns tier 0 for arms whose MaxComplexity
ceiling fits the task. CLI agents drop to tier 1, local to 2, API to 3.
For trivial tasks the SLM arm wins; for complex tasks the SLM falls
out of the feasible set (MaxComplexity exclusion) and the original
ordering reasserts.

Eager boot with user-facing wait (Gate 1)

Removed the original goroutine-only path. SLM startup now blocks
synchronously inside the factory; for llamafile that means up to
[slm].startup_timeout (default 5 s) of waiting on the first
invocation, with "Starting SLM…" → "SLM ready (backend, model, tools,
boot=N)" / "SLM unavailable: …" messages on stderr. Ollama / llamacpp
backends boot instantly because the daemon is already running.

waitHealthy() now respects the caller's context deadline instead of
its old hardcoded 15 s ceiling.

Classifier reliability

Classifier timeout bumped 2 s → 5 s for thinking-mode models like
Qwen3-distilled Tiny3.5. System prompt includes /no_think directive
for the same family. These help but don't eliminate small-model
JSON-contract failures — see the docs section on picking a model.

Probe + telemetry surfaces

gnoma slm status now prints the configured backend + model + a live
probe result (✓/✗) instead of just the llamafile manifest state.

`gnoma router stats` already (from the previous commit) shows the
classifier-source mix; with this change you can finally see slm /
slm_fallback / heuristic share rise from "always heuristic" to
something reflecting real SLM activity.

Tests

  - 9 new backend-factory tests (httptest-backed Ollama probe, error
    paths, auto-detection, capability flags)
  - Tier-ordering tests cover the new "specialised small arm wins
    trivial task" path
  - Trivial-prompt heuristic tested for both halves (knowledge-only
    flips RequiresTools=false; debug/file/run keeps it true)

Deletes the dead SLMManager field from the TUI Config — it was
declared but never read.
2026-05-19 18:53:32 +02:00
vikingowl dc438ea181 feat(plugin): trust-on-first-use manifest pinning
Plugins are now verified against ~/.config/gnoma/plugins.pins.toml at
load time. Each plugin's plugin.json bytes are hashed (SHA-256) and:

- recorded automatically on first load (TOFU) with a prominent warning
- compared on subsequent loads
- refused with a clear error if the hash drifted, without overwriting
  the pin so the user can review and re-enrol deliberately

Pin-store I/O failures degrade to load-without-pinning rather than
locking the user out of previously-trusted plugins.

Closes audit finding C2. See ADR-003 for the decision rationale and
docs/plugins-trust.md for the end-user trust model.
2026-05-19 16:44:09 +02:00
vikingowl a9213ec382 feat(slm): Wave C — SLM classifier, MaxComplexity routing, CLI subcommands, TUI status
- slm.Classifier: openaicompat → llamafile, 2s timeout + heuristic fallback,
  heuristic baseline blended so Priority/RequiredEffort are never zeroed,
  extractJSON strips markdown fences from small-model responses
- router.ParseTaskType: case-insensitive string → TaskType, unknown → TaskGeneration
- router.Arm.MaxComplexity: zero = no ceiling (preserves existing arm behavior);
  filterFeasible excludes arms when task.ComplexityScore > MaxComplexity
- config.SLMSection: [slm] enabled / model_url / data_dir
- openaicompat.NewLlamafile: no API key, model = "default", no retries
- slm.Manager: DefaultDataDir() (XDG), Manifest() accessor
- cmd/gnoma: `gnoma slm setup` / `gnoma slm status` subcommands; SLM arm
  registered with MaxComplexity=0.3 when enabled + set up
- tui: /config shows slm status (ready/missing/not set up + base URL if running)
- docs: roadmap updated to reflect llamafile pivot from Ollama
2026-05-07 16:44:32 +02:00
vikingowl 5569d4fb86 docs: consolidated roadmap, ADR-013, drop stale plans
- New 7-phase roadmap (2026-05-07-gnoma-roadmap.md) covering M8 cleanup,
  PTY interactive shell, SLM classifier, router revisit, USP security,
  ELF support, and distribution
- ADR-013 (002-slm-routing.md): SLM-first routing supersedes ADR-009;
  Thompson Sampling deferred pending SLM production data
- ADR-009 status updated to "Superseded by ADR-013"
- gemma-integration-analysis.md: header note that Node.js specifics
  (LiteRT-LM, daemon, PID) don't apply to gnoma's Go implementation
- TODO.md replaced with thin pointer to roadmap + stable backlog
- Deleted stale plan/spec files: m6-m7-closeout, m8-hooks-design
2026-05-07 15:06:54 +02:00
vikingowl e04cacc215 fix: append mutation, pipe-mode hang, Mistral regex false positives
- Fix append footgun: allHooks/allMCPServers allocated fresh to avoid
  mutating cfg's backing array (lines 391/413 in main.go)
- Fix pipe-mode permission prompt: detect no-TTY stdin and auto-deny
  instead of blocking forever on fmt.Scanln EOF
- Tighten Mistral API key regex from bare [a-zA-Z0-9]{32} (matched
  commit hashes, UUIDs) to context-gated pattern requiring "mistral"
  keyword nearby. Added scanner test for positives and negatives.
- Remove README demo GIF TODO placeholder
- Unify version string: pass buildVersion from ldflags into tui.Config
  instead of hardcoding "v0.1.0-dev"
- Populate benchmarks doc with actual Go benchmark results
2026-04-12 03:49:47 +02:00
vikingowl 6bb9c33d04 fix(m8): replace_default map, error UX, benchmarks, and launch prep
- Fix replace_default positional bug: []string → map[string]string for
  explicit MCP tool → built-in name mapping
- Improve error messages for missing API keys (3 actionable options) and
  unknown providers (early validation with available list)
- Remove python3 dependency from MCP tests (pure bash grep/sed parsing)
- Add router benchmark scaffold (6 benchmarks in bench_test.go + docs)
- Add .goreleaser.yml for cross-platform binary releases with ldflags
- Add launch-ready README with quickstart, extensibility docs, GIF placeholder
- Add CONTRIBUTING.md and Gitea issue templates (bug report, feature request)
2026-04-12 03:34:58 +02:00
vikingowl 6c47f8643b feat(m8): MCP client, tool replaceability, and plugin system
Complete the remaining M8 extensibility deliverables:

- MCP client with JSON-RPC 2.0 over stdio transport, protocol
  lifecycle (initialize/tools-list/tools-call), and process group
  management for clean shutdown
- MCP tool adapter implementing tool.Tool with mcp__{server}__{tool}
  naming convention and replace_default for swapping built-in tools
- MCP manager for multi-server orchestration with parallel startup,
  tool discovery, and registry integration
- Plugin system with plugin.json manifest (name/version/capabilities),
  directory-based discovery (global + project scopes with precedence),
  loader that merges skills/hooks/MCP configs into existing registries,
  and install/uninstall/list lifecycle manager
- Config additions: MCPServerConfig, PluginsSection with opt-in/opt-out
  enabled/disabled resolution
- TUI /plugins command for listing installed plugins
- 54 tests across internal/mcp and internal/plugin packages
2026-04-12 03:09:05 +02:00
vikingowl 8d97c6cd39 docs: mark M8.2 skill system deliverables complete in milestones.md 2026-04-07 02:25:29 +02:00
vikingowl 24f4a739a6 docs: mark M8.1 hook system deliverables complete in milestones.md 2026-04-07 01:09:07 +02:00
vikingowl fef38b3502 docs: M8.1 hook system design spec 2026-04-06 02:42:34 +02:00
vikingowl 2c0ff5ff1f docs: mark M7 deliverables complete in milestones.md 2026-04-06 00:59:16 +02:00
vikingowl 43dcc7e9de docs: M6/M7 close-out implementation plan — 8 tasks, TDD, full file map 2026-04-05 21:33:42 +02:00
vikingowl 252ffde732 docs: M6/M7 close-out design spec — tool persistence, tokenizer, router feedback, coordinator 2026-04-05 21:22:26 +02:00
vikingowl abb3e3ca90 feat: spawn_elfs batch tool for guaranteed parallel elf execution
New spawn_elfs tool takes array of tasks, spawns all elfs simultaneously.
Solves the problem of models (Mistral Small, Devstral) that serialize
tool calls instead of batching them.

Schema: {"tasks": [{"prompt": "...", "task_type": "..."}], "max_turns": 30}

Also:
- Suppress spawn_elfs tool output from chat (tree handles display)
- Update M7 milestones to reflect completed deliverables
- Add CC-inspired features to M8/M10: task notification system,
  task framework, /batch skill, coordinator mode, StreamingToolExecutor,
  git worktree isolation
2026-04-03 21:03:51 +02:00
vikingowl 8e5ddb20cb feat: hybrid system inventory — dynamic PATH scan + runtime probing
No hardcoded tool lists. Scans all $PATH directories for executables
(5541 on this system), then probes known runtime patterns for version
info (23 detected: Go, Python, Node, Rust, Ruby, Perl, Java, Dart,
Deno, Bun, Lua, LuaJIT, Guile, GCC, Clang, NASM + package managers).

System prompt includes: OS, shell, runtime versions, and notable
tools (git, docker, kubectl, fzf, rg, etc.) from the full PATH scan.
Total executable count reported so the LLM knows the full scope.

Milestones updated: M6 fixed context prefix, M12 multimodality.
2026-04-03 14:36:22 +02:00
vikingowl c54471a37b refactor: migrate mistral sdk to github.com/VikingOwl91/mistral-go-sdk
Same package, new GitHub deployment with fixed tests.
somegit.dev/vikingowl → github.com/VikingOwl91, v1.2.0 → v1.2.1
2026-04-03 12:06:59 +02:00
vikingowl 69f5dba091 feat: complete M1 — core engine with Mistral provider
Mistral provider adapter with streaming, tool calls (single-chunk
pattern), stop reason inference, model listing, capabilities, and
JSON output support.

Tool system: bash (7 security checks, shell alias harvesting for
bash/zsh/fish), file ops (read, write, edit, glob, grep, ls).
Alias harvesting collects 300+ aliases from user's shell config.

Engine agentic loop: stream → tool execution → re-query → until
done. Tool gating on model capabilities. Max turns safety limit.

CLI pipe mode: echo "prompt" | gnoma streams response to stdout.
Flags: --provider, --model, --system, --api-key, --max-turns,
--verbose, --version.

Provider interface expanded: Models(), DefaultModel(), Capabilities
(ToolUse, JSONOutput, Vision, Thinking, ContextWindow, MaxOutput),
ResponseFormat with JSON schema support.

Live verified: text streaming + tool calling with devstral-small.
117 tests across 8 packages, 10MB binary.
2026-04-03 12:01:55 +02:00
vikingowl 951ab3b970 docs: update essentials for router, security, task learning
Restructure milestones from M1-M11 to M1-M15:
- M3: Security Firewall (secret scanner, incognito mode)
- M4: Router Foundation (arm registry, pools, task classifier)
- M5: TUI with full 6 permission modes
- M6: Full compaction (truncate + LLM summarization)
- M9: Router Advanced (bandit learning, ensemble strategies)
- M11: Task Learning (pattern detection, persistent tasks)

Add ADR-007 through ADR-012 for security-as-core, router split,
Thompson Sampling, MCP replaceability, task learning, incognito.

Add risks R-010 through R-015 for router, security, feedback,
task learning, ensemble quality, shell parser.

Update architecture dependency graph with security, router,
elf, hook, skill, mcp, plugin, tasklearn packages.

Update domain model with Router, Arm, LimitPool, Firewall entities.
2026-04-03 10:47:11 +02:00
vikingowl 154d978564 docs: add project essentials (12/12 complete)
Vision, domain model, architecture, patterns, process flows,
UML diagrams, API contracts, tech stack, constraints, milestones
(M1-M11), decision log (6 ADRs), and risk register.

Key decisions: single binary, pull-based streaming, Mistral as M1
reference provider, discriminated unions, multi-provider collaboration
as core identity.
2026-04-02 18:09:07 +02:00