23 Commits

Author SHA1 Message Date
9b1d6ca100 test: M7 audit — quality feedback, coordinator, agent tool coverage
Quality feedback integration: TestQualityTracker_InfluencesArmSelection
verifies that 5 successes vs 5 failures tips Router.Select() to the
high-quality arm once EMA has enough observations. Companion test
confirms heuristic fallback below minObservations.

Coordinator tests expanded from 2 → 5: added guidance content check
(parallel/serial/synthesize present), false-positive table extended with
7 cases including the reordered keywords from the previous fix.

Agent tool suite: tool interface contracts for all four tools (Name,
Description, Parameters validity, IsReadOnly). Extracted duplicated
2000-char truncation into truncateOutput() helper (format.go), removing
the inline copies in agent.go and batch.go. Four boundary tests cover
empty, short, exact-max, and over-max cases.
2026-04-06 00:59:12 +02:00
62112cff55 feat: list_results + read_result tools for coordinator artifact discovery 2026-04-05 22:19:05 +02:00
7c991a9c68 feat: list_results + read_result tools for coordinator artifact discovery 2026-04-05 22:15:04 +02:00
6cf5e92957 feat: QualityTracker — EMA router feedback from elf outcomes, ResultFilePaths tracking 2026-04-05 22:08:08 +02:00
d251dd7507 feat: wire persist.Store into engine, elf manager, and agent tools 2026-04-05 21:59:55 +02:00
fbb28de0b8 fix: persist.Store — sanitize callID, log save errors, document List filter semantics 2026-04-05 21:44:03 +02:00
ace3716204 feat: persist.Store — session-scoped /tmp tool result persistence 2026-04-05 21:38:45 +02:00
cb2d63d06f feat: Ollama/gemma4 compat — /init flow, stream filter, safety fixes
provider/openai:
- Fix doubled tool call args (argsComplete flag): Ollama sends complete
  args in the first streaming chunk then repeats them as delta, causing
  doubled JSON and 400 errors in elfs
- Handle fs: prefix (gemma4 uses fs:grep instead of fs.grep)
- Add Reasoning field support for Ollama thinking output

cmd/gnoma:
- Early TTY detection so logger is created with correct destination
  before any component gets a reference to it (fixes slog WARN bleed
  into TUI textarea)

permission:
- Exempt spawn_elfs and agent tools from safety scanner: elf prompt
  text may legitimately mention .env/.ssh/credentials patterns and
  should not be blocked

tui/app:
- /init retry chain: no-tool-calls → spawn_elfs nudge → write nudge
  (ask for plain text output) → TUI fallback write from streamBuf
- looksLikeAgentsMD + extractMarkdownDoc: validate and clean fallback
  content before writing (reject refusals, strip narrative preambles)
- Collapse thinking output to 3 lines; ctrl+o to expand (live stream
  and committed messages)
- Stream-level filter for model pseudo-tool-call blocks: suppresses
  <<tool_code>>...</tool_code>> and <<function_call>>...<tool_call|>
  from entering streamBuf across chunk boundaries
- sanitizeAssistantText regex covers both block formats
- Reset streamFilterClose at every turn start
2026-04-05 19:24:51 +02:00
14b88cadcc feat: M1-M7 gap audit phase 3 — context prefix, deferred tools, compact hooks
Gap 11 (M6): Fixed context prefix
- Window.PrefixMessages stores immutable docs (CLAUDE.md, .gnoma/GNOMA.md)
- Prefix stripped before compaction, prepended after — survives all compaction
- AllMessages() returns prefix + history for provider requests
- main.go loads CLAUDE.md and .gnoma/GNOMA.md at startup as prefix

Gap 12 (M6): Deferred tool loading
- DeferrableTool optional interface: ShouldDefer() bool
- buildRequest() skips deferred tools until activated
- Tools auto-activate on first model request (activatedTools map)
- agent + spawn_elfs marked as deferrable (large schemas, rarely needed early)
- Saves ~800 tokens per deferred tool per request

Gap 13 (M6): Pre/post compact hooks
- OnPreCompact/OnPostCompact callbacks in WindowConfig
- Called in doCompact() (shared by CompactIfNeeded + ForceCompact)
- M8 hooks system will extend these to full protocol
2026-04-04 20:46:50 +02:00
509c897847 feat: M1-M7 gap audit phase 2 — security, TUI, context, router feedback
Gap 6 (M3): 7 new bash security checks (8-14)
- JQ injection, obfuscated flags (Unicode lookalike hyphens),
  /proc/environ access, brace expansion, Unicode whitespace,
  zsh dangerous constructs, comment-quote desync
- Total: 14 checks (was 7)

Gap 7 (M5): Model picker numbered selection
- /model shows numbered sorted list, /model 3 picks by number

Gap 8 (M5): /config set command
- /config set provider.default mistral writes to .gnoma/config.toml
- Whitelisted keys: provider.default, provider.model, permission.mode
- New config/write.go with TOML round-trip via BurntSushi/toml

Gap 9 (M6): Simple token estimator
- EstimateTokens (len/4 heuristic), EstimateMessages (content + overhead)
- PreEstimate on Tracker for proactive compaction triggering

Gap 10 (M7): Router quality feedback from elfs
- Router.Outcome + ReportOutcome (logs for now, M9 bandit uses later)
- Manager tracks armID/taskType per elf via elfMeta map
- Manager.ReportResult called after elf completion in both agent + batch tools
2026-04-04 11:07:08 +02:00
38fc49a6c4 fix: retry with exponential backoff on 429, stagger elf spawns
Engine retries transient errors (429, 5xx) up to 4 times with
1s/2s/4s/8s backoff. Respects Retry-After header from provider.

Batch tool staggers elf spawns by 300ms to avoid rate limit bursts
when all elfs hit the API simultaneously (Mistral's 1 req/s limit).
2026-04-03 21:08:20 +02:00
ace9b5f273 feat: spawn_elfs batch tool for guaranteed parallel elf execution
New spawn_elfs tool takes array of tasks, spawns all elfs simultaneously.
Solves the problem of models (Mistral Small, Devstral) that serialize
tool calls instead of batching them.

Schema: {"tasks": [{"prompt": "...", "task_type": "..."}], "max_turns": 30}

Also:
- Suppress spawn_elfs tool output from chat (tree handles display)
- Update M7 milestones to reflect completed deliverables
- Add CC-inspired features to M8/M10: task notification system,
  task framework, /batch skill, coordinator mode, StreamingToolExecutor,
  git worktree isolation
2026-04-03 21:03:51 +02:00
706363f94b feat: rate limit pools, elf tree view, permission prompts, dep updates
Rate limits:
- Add PoolRPS/PoolTPM/PoolTokensMonth/PoolCostMonth pool kinds
- Provider defaults for Mistral/Anthropic/OpenAI/Google (tier-aware)
- Config override via [rate_limits.<provider>] TOML section
- Pools auto-attached to arms on registration

Elf tree view (CC-style):
- Structured elf.Progress type replaces flat string channel
- Tree with ├─/└─ branches, per-elf stats (tool uses, tokens)
- Live activity updates: tool calls, "generating… (N chars)"
- Completed elfs stay in tree with "Done (duration)" until turn ends
- Suppress raw elf output from chat (tree + LLM summary instead)
- Remove background elf mode (wait: false) — always wait
- Truncate elf results to 2000 chars for parent context
- Parallel hint in system prompt and tool description

Permission prompts:
- Show actual command in prompt: "bash wants to execute: find . -name '*.go'"
- Compact hint in separator bar: "⚠ bash: find . | wc -l [y/n]"
- PermReqMsg carries tool name + args

Other:
- Fix /model not updating status bar (session.Local.SetModel)
- Add make targets: run, check, install
- Update deps: BurntSushi/toml v1.6.0, chroma v2.23.1, x/text v0.35.0, cloud.google.com/go v0.123.0
2026-04-03 20:54:48 +02:00
1f416bac8f fix: live elf progress shows tool calls + results, not just text 2026-04-03 19:42:48 +02:00
97d5093526 feat: configurable max_turns for elfs — LLM sets via agent tool param 2026-04-03 19:37:17 +02:00
2ccc261c39 fix: elf progress — proper last-2-lines tracking, 70 char truncation 2026-04-03 19:30:18 +02:00
e0cdc891f1 feat: live elf progress in TUI
- Elf tool calls show as 🦉 [elf] <prompt> (not ⚙ [agent])
- Live 2-line progress beneath the elf label showing what the
  elf is currently outputting (grey, auto-updated)
- Agent tool forwards elf streaming events via progress channel
- Progress cleared on turn completion
- elfProgressCh wired from agent tool → TUI
2026-04-03 19:25:43 +02:00
07c739795c feat: M7 Elfs — sub-agents with router-integrated spawning
internal/elf/:
- BackgroundElf: runs on own goroutine with independent engine,
  history, and provider. No shared mutable state.
- Manager: spawns elfs via router.Select() (picks best arm per
  task type), tracks lifecycle, WaitAll(), CancelAll(), Cleanup().

internal/tool/agent/:
- Agent tool: LLM can call 'agent' to spawn sub-agents.
  Supports task_type hint for routing, wait/background mode.
  5-minute timeout, context cancellation propagated.

Concurrent tool execution:
- Read-only tools (fs.read, fs.grep, fs.glob, etc.) execute in
  parallel via goroutines.
- Write tools (bash, fs.write, fs.edit) execute sequentially.
- Partition by tool.IsReadOnly().

TUI: /elf command explains how to use sub-agents.
5 elf tests. Exit criteria: parent spawns 3 background elfs on
different providers, collects and synthesizes results.
2026-04-03 19:16:46 +02:00
4847421b17 feat: auto permission mode, edit diffs, truncated tool output
- Default permission mode changed to 'auto' (read-only auto-allows,
  writes prompt)
- fs.edit now shows diff-style output: line numbers, context ±3 lines,
  + for added (green), - for removed (red)
- Tool output truncated to 10 lines in TUI with "+N lines (Ctrl+O
  to expand)" indicator
- Mistral SDK bumped to v1.3.0
2026-04-03 18:57:13 +02:00
279a8d43bd feat: complete 7/7 bash security checks
Added:
- Standalone semicolon check: blocks ; outside quotes (use && instead)
- Sensitive redirection check: blocks > to /etc/passwd, .bashrc,
  .ssh/authorized_keys, .env, etc.

Now all 7 security checks are active:
1. Incomplete commands, 2. Control characters, 3. Newline injection,
4. Command substitution, 5. Dangerous variables, 6. Semicolons,
7. Sensitive redirections
2026-04-03 17:56:01 +02:00
11a7a51d9d feat: compact system inventory with queryable system_info tool
System prompt gets a one-line summary (~200 chars): OS, CPU, RAM,
GPU, top runtimes, package count, PATH command count.

Full details available on demand via system_info tool with sections:
runtimes, packages, tools, hardware, all. LLM calls the tool when
it needs specifics — saves thousands of tokens per request.

Hardware detection: CPU model, core count, total RAM, GPU via lspci.
Package manager: pacman/apt/dnf/brew with dev package filtering.
PATH scan: 5541 executables. Runtime probing: 22 detected.
2026-04-03 14:50:33 +02:00
d02b544e08 feat: hybrid system inventory — dynamic PATH scan + runtime probing
No hardcoded tool lists. Scans all $PATH directories for executables
(5541 on this system), then probes known runtime patterns for version
info (23 detected: Go, Python, Node, Rust, Ruby, Perl, Java, Dart,
Deno, Bun, Lua, LuaJIT, Guile, GCC, Clang, NASM + package managers).

System prompt includes: OS, shell, runtime versions, and notable
tools (git, docker, kubectl, fzf, rg, etc.) from the full PATH scan.
Total executable count reported so the LLM knows the full scope.

Milestones updated: M6 fixed context prefix, M12 multimodality.
2026-04-03 14:36:22 +02:00
f0633d8ac6 feat: complete M1 — core engine with Mistral provider
Mistral provider adapter with streaming, tool calls (single-chunk
pattern), stop reason inference, model listing, capabilities, and
JSON output support.

Tool system: bash (7 security checks, shell alias harvesting for
bash/zsh/fish), file ops (read, write, edit, glob, grep, ls).
Alias harvesting collects 300+ aliases from user's shell config.

Engine agentic loop: stream → tool execution → re-query → until
done. Tool gating on model capabilities. Max turns safety limit.

CLI pipe mode: echo "prompt" | gnoma streams response to stdout.
Flags: --provider, --model, --system, --api-key, --max-turns,
--verbose, --version.

Provider interface expanded: Models(), DefaultModel(), Capabilities
(ToolUse, JSONOutput, Vision, Thinking, ContextWindow, MaxOutput),
ResponseFormat with JSON schema support.

Live verified: text streaming + tool calling with devstral-small.
117 tests across 8 packages, 10MB binary.
2026-04-03 12:01:55 +02:00