gnoma

Author	SHA1	Message	Date
vikingowl	9b1d6ca100	test: M7 audit — quality feedback, coordinator, agent tool coverage Quality feedback integration: TestQualityTracker_InfluencesArmSelection verifies that 5 successes vs 5 failures tips Router.Select() to the high-quality arm once EMA has enough observations. Companion test confirms heuristic fallback below minObservations. Coordinator tests expanded from 2 → 5: added guidance content check (parallel/serial/synthesize present), false-positive table extended with 7 cases including the reordered keywords from the previous fix. Agent tool suite: tool interface contracts for all four tools (Name, Description, Parameters validity, IsReadOnly). Extracted duplicated 2000-char truncation into truncateOutput() helper (format.go), removing the inline copies in agent.go and batch.go. Four boundary tests cover empty, short, exact-max, and over-max cases.	2026-04-06 00:59:12 +02:00
vikingowl	62112cff55	feat: list_results + read_result tools for coordinator artifact discovery	2026-04-05 22:19:05 +02:00
vikingowl	7c991a9c68	feat: list_results + read_result tools for coordinator artifact discovery	2026-04-05 22:15:04 +02:00
vikingowl	6cf5e92957	feat: QualityTracker — EMA router feedback from elf outcomes, ResultFilePaths tracking	2026-04-05 22:08:08 +02:00
vikingowl	d251dd7507	feat: wire persist.Store into engine, elf manager, and agent tools	2026-04-05 21:59:55 +02:00
vikingowl	fbb28de0b8	fix: persist.Store — sanitize callID, log save errors, document List filter semantics	2026-04-05 21:44:03 +02:00
vikingowl	ace3716204	feat: persist.Store — session-scoped /tmp tool result persistence	2026-04-05 21:38:45 +02:00
vikingowl	cb2d63d06f	feat: Ollama/gemma4 compat — /init flow, stream filter, safety fixes provider/openai: - Fix doubled tool call args (argsComplete flag): Ollama sends complete args in the first streaming chunk then repeats them as delta, causing doubled JSON and 400 errors in elfs - Handle fs: prefix (gemma4 uses fs:grep instead of fs.grep) - Add Reasoning field support for Ollama thinking output cmd/gnoma: - Early TTY detection so logger is created with correct destination before any component gets a reference to it (fixes slog WARN bleed into TUI textarea) permission: - Exempt spawn_elfs and agent tools from safety scanner: elf prompt text may legitimately mention .env/.ssh/credentials patterns and should not be blocked tui/app: - /init retry chain: no-tool-calls → spawn_elfs nudge → write nudge (ask for plain text output) → TUI fallback write from streamBuf - looksLikeAgentsMD + extractMarkdownDoc: validate and clean fallback content before writing (reject refusals, strip narrative preambles) - Collapse thinking output to 3 lines; ctrl+o to expand (live stream and committed messages) - Stream-level filter for model pseudo-tool-call blocks: suppresses <<tool_code>>...</tool_code>> and <<function_call>>...<tool_call\|> from entering streamBuf across chunk boundaries - sanitizeAssistantText regex covers both block formats - Reset streamFilterClose at every turn start	2026-04-05 19:24:51 +02:00
vikingowl	14b88cadcc	feat: M1-M7 gap audit phase 3 — context prefix, deferred tools, compact hooks Gap 11 (M6): Fixed context prefix - Window.PrefixMessages stores immutable docs (CLAUDE.md, .gnoma/GNOMA.md) - Prefix stripped before compaction, prepended after — survives all compaction - AllMessages() returns prefix + history for provider requests - main.go loads CLAUDE.md and .gnoma/GNOMA.md at startup as prefix Gap 12 (M6): Deferred tool loading - DeferrableTool optional interface: ShouldDefer() bool - buildRequest() skips deferred tools until activated - Tools auto-activate on first model request (activatedTools map) - agent + spawn_elfs marked as deferrable (large schemas, rarely needed early) - Saves ~800 tokens per deferred tool per request Gap 13 (M6): Pre/post compact hooks - OnPreCompact/OnPostCompact callbacks in WindowConfig - Called in doCompact() (shared by CompactIfNeeded + ForceCompact) - M8 hooks system will extend these to full protocol	2026-04-04 20:46:50 +02:00
vikingowl	509c897847	feat: M1-M7 gap audit phase 2 — security, TUI, context, router feedback Gap 6 (M3): 7 new bash security checks (8-14) - JQ injection, obfuscated flags (Unicode lookalike hyphens), /proc/environ access, brace expansion, Unicode whitespace, zsh dangerous constructs, comment-quote desync - Total: 14 checks (was 7) Gap 7 (M5): Model picker numbered selection - /model shows numbered sorted list, /model 3 picks by number Gap 8 (M5): /config set command - /config set provider.default mistral writes to .gnoma/config.toml - Whitelisted keys: provider.default, provider.model, permission.mode - New config/write.go with TOML round-trip via BurntSushi/toml Gap 9 (M6): Simple token estimator - EstimateTokens (len/4 heuristic), EstimateMessages (content + overhead) - PreEstimate on Tracker for proactive compaction triggering Gap 10 (M7): Router quality feedback from elfs - Router.Outcome + ReportOutcome (logs for now, M9 bandit uses later) - Manager tracks armID/taskType per elf via elfMeta map - Manager.ReportResult called after elf completion in both agent + batch tools	2026-04-04 11:07:08 +02:00
vikingowl	38fc49a6c4	fix: retry with exponential backoff on 429, stagger elf spawns Engine retries transient errors (429, 5xx) up to 4 times with 1s/2s/4s/8s backoff. Respects Retry-After header from provider. Batch tool staggers elf spawns by 300ms to avoid rate limit bursts when all elfs hit the API simultaneously (Mistral's 1 req/s limit).	2026-04-03 21:08:20 +02:00
vikingowl	ace9b5f273	feat: spawn_elfs batch tool for guaranteed parallel elf execution New spawn_elfs tool takes array of tasks, spawns all elfs simultaneously. Solves the problem of models (Mistral Small, Devstral) that serialize tool calls instead of batching them. Schema: {"tasks": [{"prompt": "...", "task_type": "..."}], "max_turns": 30} Also: - Suppress spawn_elfs tool output from chat (tree handles display) - Update M7 milestones to reflect completed deliverables - Add CC-inspired features to M8/M10: task notification system, task framework, /batch skill, coordinator mode, StreamingToolExecutor, git worktree isolation	2026-04-03 21:03:51 +02:00
vikingowl	706363f94b	feat: rate limit pools, elf tree view, permission prompts, dep updates Rate limits: - Add PoolRPS/PoolTPM/PoolTokensMonth/PoolCostMonth pool kinds - Provider defaults for Mistral/Anthropic/OpenAI/Google (tier-aware) - Config override via [rate_limits.<provider>] TOML section - Pools auto-attached to arms on registration Elf tree view (CC-style): - Structured elf.Progress type replaces flat string channel - Tree with ├─/└─ branches, per-elf stats (tool uses, tokens) - Live activity updates: tool calls, "generating… (N chars)" - Completed elfs stay in tree with "Done (duration)" until turn ends - Suppress raw elf output from chat (tree + LLM summary instead) - Remove background elf mode (wait: false) — always wait - Truncate elf results to 2000 chars for parent context - Parallel hint in system prompt and tool description Permission prompts: - Show actual command in prompt: "bash wants to execute: find . -name '*.go'" - Compact hint in separator bar: "⚠ bash: find . \| wc -l [y/n]" - PermReqMsg carries tool name + args Other: - Fix /model not updating status bar (session.Local.SetModel) - Add make targets: run, check, install - Update deps: BurntSushi/toml v1.6.0, chroma v2.23.1, x/text v0.35.0, cloud.google.com/go v0.123.0	2026-04-03 20:54:48 +02:00
vikingowl	1f416bac8f	fix: live elf progress shows tool calls + results, not just text	2026-04-03 19:42:48 +02:00
vikingowl	97d5093526	feat: configurable max_turns for elfs — LLM sets via agent tool param	2026-04-03 19:37:17 +02:00
vikingowl	2ccc261c39	fix: elf progress — proper last-2-lines tracking, 70 char truncation	2026-04-03 19:30:18 +02:00
vikingowl	e0cdc891f1	feat: live elf progress in TUI - Elf tool calls show as 🦉 [elf] <prompt> (not ⚙ [agent]) - Live 2-line progress beneath the elf label showing what the elf is currently outputting (grey, auto-updated) - Agent tool forwards elf streaming events via progress channel - Progress cleared on turn completion - elfProgressCh wired from agent tool → TUI	2026-04-03 19:25:43 +02:00
vikingowl	07c739795c	feat: M7 Elfs — sub-agents with router-integrated spawning internal/elf/: - BackgroundElf: runs on own goroutine with independent engine, history, and provider. No shared mutable state. - Manager: spawns elfs via router.Select() (picks best arm per task type), tracks lifecycle, WaitAll(), CancelAll(), Cleanup(). internal/tool/agent/: - Agent tool: LLM can call 'agent' to spawn sub-agents. Supports task_type hint for routing, wait/background mode. 5-minute timeout, context cancellation propagated. Concurrent tool execution: - Read-only tools (fs.read, fs.grep, fs.glob, etc.) execute in parallel via goroutines. - Write tools (bash, fs.write, fs.edit) execute sequentially. - Partition by tool.IsReadOnly(). TUI: /elf command explains how to use sub-agents. 5 elf tests. Exit criteria: parent spawns 3 background elfs on different providers, collects and synthesizes results.	2026-04-03 19:16:46 +02:00
vikingowl	4847421b17	feat: auto permission mode, edit diffs, truncated tool output - Default permission mode changed to 'auto' (read-only auto-allows, writes prompt) - fs.edit now shows diff-style output: line numbers, context ±3 lines, + for added (green), - for removed (red) - Tool output truncated to 10 lines in TUI with "+N lines (Ctrl+O to expand)" indicator - Mistral SDK bumped to v1.3.0	2026-04-03 18:57:13 +02:00
vikingowl	279a8d43bd	feat: complete 7/7 bash security checks Added: - Standalone semicolon check: blocks ; outside quotes (use && instead) - Sensitive redirection check: blocks > to /etc/passwd, .bashrc, .ssh/authorized_keys, .env, etc. Now all 7 security checks are active: 1. Incomplete commands, 2. Control characters, 3. Newline injection, 4. Command substitution, 5. Dangerous variables, 6. Semicolons, 7. Sensitive redirections	2026-04-03 17:56:01 +02:00
vikingowl	11a7a51d9d	feat: compact system inventory with queryable system_info tool System prompt gets a one-line summary (~200 chars): OS, CPU, RAM, GPU, top runtimes, package count, PATH command count. Full details available on demand via system_info tool with sections: runtimes, packages, tools, hardware, all. LLM calls the tool when it needs specifics — saves thousands of tokens per request. Hardware detection: CPU model, core count, total RAM, GPU via lspci. Package manager: pacman/apt/dnf/brew with dev package filtering. PATH scan: 5541 executables. Runtime probing: 22 detected.	2026-04-03 14:50:33 +02:00
vikingowl	d02b544e08	feat: hybrid system inventory — dynamic PATH scan + runtime probing No hardcoded tool lists. Scans all $PATH directories for executables (5541 on this system), then probes known runtime patterns for version info (23 detected: Go, Python, Node, Rust, Ruby, Perl, Java, Dart, Deno, Bun, Lua, LuaJIT, Guile, GCC, Clang, NASM + package managers). System prompt includes: OS, shell, runtime versions, and notable tools (git, docker, kubectl, fzf, rg, etc.) from the full PATH scan. Total executable count reported so the LLM knows the full scope. Milestones updated: M6 fixed context prefix, M12 multimodality.	2026-04-03 14:36:22 +02:00
vikingowl	f0633d8ac6	feat: complete M1 — core engine with Mistral provider Mistral provider adapter with streaming, tool calls (single-chunk pattern), stop reason inference, model listing, capabilities, and JSON output support. Tool system: bash (7 security checks, shell alias harvesting for bash/zsh/fish), file ops (read, write, edit, glob, grep, ls). Alias harvesting collects 300+ aliases from user's shell config. Engine agentic loop: stream → tool execution → re-query → until done. Tool gating on model capabilities. Max turns safety limit. CLI pipe mode: echo "prompt" \| gnoma streams response to stdout. Flags: --provider, --model, --system, --api-key, --max-turns, --verbose, --version. Provider interface expanded: Models(), DefaultModel(), Capabilities (ToolUse, JSONOutput, Vision, Thinking, ContextWindow, MaxOutput), ResponseFormat with JSON schema support. Live verified: text streaming + tool calling with devstral-small. 117 tests across 8 packages, 10MB binary.	2026-04-03 12:01:55 +02:00

23 Commits