16 Commits

Author SHA1 Message Date
9b1d6ca100 test: M7 audit — quality feedback, coordinator, agent tool coverage
Quality feedback integration: TestQualityTracker_InfluencesArmSelection
verifies that 5 successes vs 5 failures tips Router.Select() to the
high-quality arm once EMA has enough observations. Companion test
confirms heuristic fallback below minObservations.

Coordinator tests expanded from 2 → 5: added guidance content check
(parallel/serial/synthesize present), false-positive table extended with
7 cases including the reordered keywords from the previous fix.

Agent tool suite: tool interface contracts for all four tools (Name,
Description, Parameters validity, IsReadOnly). Extracted duplicated
2000-char truncation into truncateOutput() helper (format.go), removing
the inline copies in agent.go and batch.go. Four boundary tests cover
empty, short, exact-max, and over-max cases.
2026-04-06 00:59:12 +02:00
62112cff55 feat: list_results + read_result tools for coordinator artifact discovery 2026-04-05 22:19:05 +02:00
7c991a9c68 feat: list_results + read_result tools for coordinator artifact discovery 2026-04-05 22:15:04 +02:00
6cf5e92957 feat: QualityTracker — EMA router feedback from elf outcomes, ResultFilePaths tracking 2026-04-05 22:08:08 +02:00
d251dd7507 feat: wire persist.Store into engine, elf manager, and agent tools 2026-04-05 21:59:55 +02:00
cb2d63d06f feat: Ollama/gemma4 compat — /init flow, stream filter, safety fixes
provider/openai:
- Fix doubled tool call args (argsComplete flag): Ollama sends complete
  args in the first streaming chunk then repeats them as delta, causing
  doubled JSON and 400 errors in elfs
- Handle fs: prefix (gemma4 uses fs:grep instead of fs.grep)
- Add Reasoning field support for Ollama thinking output

cmd/gnoma:
- Early TTY detection so logger is created with correct destination
  before any component gets a reference to it (fixes slog WARN bleed
  into TUI textarea)

permission:
- Exempt spawn_elfs and agent tools from safety scanner: elf prompt
  text may legitimately mention .env/.ssh/credentials patterns and
  should not be blocked

tui/app:
- /init retry chain: no-tool-calls → spawn_elfs nudge → write nudge
  (ask for plain text output) → TUI fallback write from streamBuf
- looksLikeAgentsMD + extractMarkdownDoc: validate and clean fallback
  content before writing (reject refusals, strip narrative preambles)
- Collapse thinking output to 3 lines; ctrl+o to expand (live stream
  and committed messages)
- Stream-level filter for model pseudo-tool-call blocks: suppresses
  <<tool_code>>...</tool_code>> and <<function_call>>...<tool_call|>
  from entering streamBuf across chunk boundaries
- sanitizeAssistantText regex covers both block formats
- Reset streamFilterClose at every turn start
2026-04-05 19:24:51 +02:00
14b88cadcc feat: M1-M7 gap audit phase 3 — context prefix, deferred tools, compact hooks
Gap 11 (M6): Fixed context prefix
- Window.PrefixMessages stores immutable docs (CLAUDE.md, .gnoma/GNOMA.md)
- Prefix stripped before compaction, prepended after — survives all compaction
- AllMessages() returns prefix + history for provider requests
- main.go loads CLAUDE.md and .gnoma/GNOMA.md at startup as prefix

Gap 12 (M6): Deferred tool loading
- DeferrableTool optional interface: ShouldDefer() bool
- buildRequest() skips deferred tools until activated
- Tools auto-activate on first model request (activatedTools map)
- agent + spawn_elfs marked as deferrable (large schemas, rarely needed early)
- Saves ~800 tokens per deferred tool per request

Gap 13 (M6): Pre/post compact hooks
- OnPreCompact/OnPostCompact callbacks in WindowConfig
- Called in doCompact() (shared by CompactIfNeeded + ForceCompact)
- M8 hooks system will extend these to full protocol
2026-04-04 20:46:50 +02:00
509c897847 feat: M1-M7 gap audit phase 2 — security, TUI, context, router feedback
Gap 6 (M3): 7 new bash security checks (8-14)
- JQ injection, obfuscated flags (Unicode lookalike hyphens),
  /proc/environ access, brace expansion, Unicode whitespace,
  zsh dangerous constructs, comment-quote desync
- Total: 14 checks (was 7)

Gap 7 (M5): Model picker numbered selection
- /model shows numbered sorted list, /model 3 picks by number

Gap 8 (M5): /config set command
- /config set provider.default mistral writes to .gnoma/config.toml
- Whitelisted keys: provider.default, provider.model, permission.mode
- New config/write.go with TOML round-trip via BurntSushi/toml

Gap 9 (M6): Simple token estimator
- EstimateTokens (len/4 heuristic), EstimateMessages (content + overhead)
- PreEstimate on Tracker for proactive compaction triggering

Gap 10 (M7): Router quality feedback from elfs
- Router.Outcome + ReportOutcome (logs for now, M9 bandit uses later)
- Manager tracks armID/taskType per elf via elfMeta map
- Manager.ReportResult called after elf completion in both agent + batch tools
2026-04-04 11:07:08 +02:00
38fc49a6c4 fix: retry with exponential backoff on 429, stagger elf spawns
Engine retries transient errors (429, 5xx) up to 4 times with
1s/2s/4s/8s backoff. Respects Retry-After header from provider.

Batch tool staggers elf spawns by 300ms to avoid rate limit bursts
when all elfs hit the API simultaneously (Mistral's 1 req/s limit).
2026-04-03 21:08:20 +02:00
ace9b5f273 feat: spawn_elfs batch tool for guaranteed parallel elf execution
New spawn_elfs tool takes array of tasks, spawns all elfs simultaneously.
Solves the problem of models (Mistral Small, Devstral) that serialize
tool calls instead of batching them.

Schema: {"tasks": [{"prompt": "...", "task_type": "..."}], "max_turns": 30}

Also:
- Suppress spawn_elfs tool output from chat (tree handles display)
- Update M7 milestones to reflect completed deliverables
- Add CC-inspired features to M8/M10: task notification system,
  task framework, /batch skill, coordinator mode, StreamingToolExecutor,
  git worktree isolation
2026-04-03 21:03:51 +02:00
706363f94b feat: rate limit pools, elf tree view, permission prompts, dep updates
Rate limits:
- Add PoolRPS/PoolTPM/PoolTokensMonth/PoolCostMonth pool kinds
- Provider defaults for Mistral/Anthropic/OpenAI/Google (tier-aware)
- Config override via [rate_limits.<provider>] TOML section
- Pools auto-attached to arms on registration

Elf tree view (CC-style):
- Structured elf.Progress type replaces flat string channel
- Tree with ├─/└─ branches, per-elf stats (tool uses, tokens)
- Live activity updates: tool calls, "generating… (N chars)"
- Completed elfs stay in tree with "Done (duration)" until turn ends
- Suppress raw elf output from chat (tree + LLM summary instead)
- Remove background elf mode (wait: false) — always wait
- Truncate elf results to 2000 chars for parent context
- Parallel hint in system prompt and tool description

Permission prompts:
- Show actual command in prompt: "bash wants to execute: find . -name '*.go'"
- Compact hint in separator bar: "⚠ bash: find . | wc -l [y/n]"
- PermReqMsg carries tool name + args

Other:
- Fix /model not updating status bar (session.Local.SetModel)
- Add make targets: run, check, install
- Update deps: BurntSushi/toml v1.6.0, chroma v2.23.1, x/text v0.35.0, cloud.google.com/go v0.123.0
2026-04-03 20:54:48 +02:00
1f416bac8f fix: live elf progress shows tool calls + results, not just text 2026-04-03 19:42:48 +02:00
97d5093526 feat: configurable max_turns for elfs — LLM sets via agent tool param 2026-04-03 19:37:17 +02:00
2ccc261c39 fix: elf progress — proper last-2-lines tracking, 70 char truncation 2026-04-03 19:30:18 +02:00
e0cdc891f1 feat: live elf progress in TUI
- Elf tool calls show as 🦉 [elf] <prompt> (not ⚙ [agent])
- Live 2-line progress beneath the elf label showing what the
  elf is currently outputting (grey, auto-updated)
- Agent tool forwards elf streaming events via progress channel
- Progress cleared on turn completion
- elfProgressCh wired from agent tool → TUI
2026-04-03 19:25:43 +02:00
07c739795c feat: M7 Elfs — sub-agents with router-integrated spawning
internal/elf/:
- BackgroundElf: runs on own goroutine with independent engine,
  history, and provider. No shared mutable state.
- Manager: spawns elfs via router.Select() (picks best arm per
  task type), tracks lifecycle, WaitAll(), CancelAll(), Cleanup().

internal/tool/agent/:
- Agent tool: LLM can call 'agent' to spawn sub-agents.
  Supports task_type hint for routing, wait/background mode.
  5-minute timeout, context cancellation propagated.

Concurrent tool execution:
- Read-only tools (fs.read, fs.grep, fs.glob, etc.) execute in
  parallel via goroutines.
- Write tools (bash, fs.write, fs.edit) execute sequentially.
- Partition by tool.IsReadOnly().

TUI: /elf command explains how to use sub-agents.
5 elf tests. Exit criteria: parent spawns 3 background elfs on
different providers, collects and synthesizes results.
2026-04-03 19:16:46 +02:00