36 Commits

Author SHA1 Message Date
vikingowl c065a2dea7 fix(provider/openai): wire ResponseFormat into OpenAI request params
provider.Request.ResponseFormat was being silently dropped by the
openai translation layer (translate.go:translateRequest). The
upstream provider type and the openai-go SDK both supported it; the
adapter just never propagated it.

This is why Move 1 (set ResponseFormat=ResponseJSON in the SLM
classifier) produced zero observable change: the field made it from
the classifier into provider.Request but stopped at the OpenAI
translation step. The ollama backend (used via the OpenAI-compatible
endpoint) therefore never received format=json_object and kept
emitting free-form prose, which the classifier's downstream JSON
parser duly rejected — 50 fallbacks in a row across two model swaps.

Translate provider.ResponseJSON to oai.ResponseFormatJSONObjectParam
and provider.ResponseText to oai.ResponseFormatTextParam; leave the
union zero-valued when the caller didn't set ResponseFormat so the
SDK omits the field per its omitzero tag. Three table cases cover
the json / text / unset paths.

Affects ollama, llama.cpp, llamafile, and any other backend reached
via openaicompat — all run through openai.translateRequest.
2026-05-25 01:26:38 +02:00
vikingowl d38d7daf25 fix(subprocess/agy): disable ToolUse until stream-json lands
agy is registered with FormatAgyText and the agyParser emits every
stdout line as a plain EventTextDelta. There is no path for a
structured ToolCall event to come back. With ToolUse=true the router
would dispatch tool-needing tasks (security_review, spawn_elfs, file
edit) to agy; the underlying Gemini model would describe calling the
tool in prose — invented UUIDs and 'I will pause now'-style stubs —
the engine would receive only text, and the turn would hang waiting
for a tool call that never arrives.

Surfaced when /init routed to agy for a security_review task and
elf spawning visibly hallucinated in the TUI. Capability flag
flipped to false; agy stays usable for tool-free prompts (explain,
summarize, simple chat). TODO entry for native stream-json updated
to flag that the capability flip is part of that same change.
2026-05-24 21:58:22 +02:00
vikingowl a23eb6b92c style: gofmt drift from prior commits
Pure whitespace cleanup surfaced when 'make check' ran gofmt over the
tree. Mostly struct-field column alignment in internal/safety/banner.go
(SessionInfo) and the var(...) flag block in cmd/gnoma/main.go after
--dangerously-allow-anywhere was added without realignment. Verified
zero substantive changes via 'git diff --ignore-all-space
--ignore-blank-lines'.
2026-05-24 16:33:17 +02:00
vikingowl 2f8d4c412f feat(router): cloud-arm defaults, gpt-5.3-codex registration
Closes R-4 and R-5 of the routing-defaults plan.

R-4: Strengths + CostWeight defaults for closed frontier models.
Cloud entries land in the same knownFamilyDefaults table as local
ones, with MaxComplexity intentionally left zero (cloud arms get
no complexity ceiling). CostWeight tuned per the plan's rationale:

  claude-opus-4-7    → Planning/SecurityReview/Debug/Refactor, 0.3
  claude-sonnet-4-6  → Generation/Refactor/Review,             0.7
  gpt-5.5            → Planning/SecurityReview/Generation,     0.3
  gpt-5.3-codex      → Generation/Refactor/Debug/UnitTest,     0.6
  gpt-5.2            → Orchestration/Review,                   0.8
  gemini-3.1-pro     → Planning/Review/Orchestration,          0.5
  gemini-3.5-flash   → Boilerplate/Explain/Orchestration,      1.2

The 0.3 weight on frontier arms keeps them competitive on
SecurityReview / Planning despite $4+/Mtok; 1.2 on Gemini Flash
penalizes cost more so it only wins when cost is genuinely
decisive (boilerplate, explain).

Mechanism: extracted applyFamilyDefaults into defaults.go and call
it from Router.RegisterArm. Single source of truth — both local
discovery and the primary-provider path in cmd/gnoma/main.go now
flow through the same defaults application. Removed the duplicate
apply block from RegisterDiscoveredModels.

Legacy model IDs (claude-opus-4-20250514, gpt-4o, o3, gemini-2.5-pro,
etc.) intentionally do not match any table entry — keeps users on
pinned older models safe from imposed 2026 Strengths.

R-5: gpt-5.3-codex registration.

  - internal/provider/openai/provider.go: added to fallbackModels
    and inferOpenAIModelCapabilities (400K context, 32K output).
  - internal/provider/ratelimits.go: gpt-5.3-codex and its dated
    alias gpt-5.3-codex-2026-02-15 added with the same Tier 1
    quotas as gpt-5.2.

Gemini 3.x (3.1-pro-preview, 3.5-flash, 3.1-flash-lite) was already
registered in both google/provider.go and ratelimits.go — no change
needed for that part of R-5.

Test coverage:
- ResolveFamilyDefaults table-driven across all 7 cloud entries
  including prefix-sharing (gpt-5.5-pro → gpt-5.5 defaults,
  gemini-3.1-pro-preview → gemini-3.1-pro defaults).
- Legacy IDs return !ok.
- RegisterArm applies cloud defaults end-to-end.
- User-supplied Strengths and CostWeight are not overridden.
- ID.Model() fallback works when ModelName is empty (test code
  often constructs arms this way).

Refs: docs/superpowers/plans/2026-05-23-routing-defaults-refresh.md
2026-05-23 21:39:48 +02:00
vikingowl 1606d19366 feat(subprocess/codex): account for cached and reasoning tokens
codex 0.133.0 emits two token-accounting fields at top level that
we previously dropped:

  cached_input_tokens   — subset of input_tokens that hit the prompt
                          cache (cheaper, but still counted in
                          input_tokens per OpenAI Responses API
                          semantics)
  reasoning_output_tokens — separately reported billable thinking
                          tokens on reasoning-capable models

Map cached_input_tokens to message.Usage.CacheReadTokens and subtract
it from InputTokens. message.Usage.Add() sums InputTokens and
CacheReadTokens as peers, so the uncached residual goes in
InputTokens — matches the anthropic provider's convention and keeps
cumulative usage tracking arithmetically correct.

Fold reasoning_output_tokens into OutputTokens for accurate cost
tracking. The top-level peer positioning (vs nested in
output_tokens_details) implies a separately counted billable
quantity, not a subset of output_tokens.

Defensive clamp at zero in case a future codex build reports
cached > input due to schema drift. Includes a verbatim regression
guard against the live 2026-05-22 codex 0.133.0 output to catch
schema changes early.
2026-05-22 13:35:57 +02:00
vikingowl ea1a5361e2 chore: restore agy JSON-output TODO; idiomatic t.TempDir() in google test
The worktree commit 12a6b83 dropped the "Native agy JSON output"
backlog item alongside removing the agy agent. Since we restored
agy in this branch, the TODO is relevant again — agy v1.0.0 still
emits plain text and the prompt-augmentation fallback should be
replaced by --output-format stream-json once the CLI supports it.

Switch TestTryLoadOAuthCredentials_Formats to t.TempDir() to drop
the unchecked os.RemoveAll defer that golangci-lint's errcheck
caught after the merge.
2026-05-22 12:17:10 +02:00
vikingowl 246997c4be Merge branch 'feat/agy-sdk-integration' into dev
Brings in the Google auth precedence work (agy > gemini > ADC
credential walk, fileTokenProvider expiry handling, slog-backed
error reporting), the Codex CLI integration as a new subprocess
agent, and the restoration of the agy subprocess agent that was
accidentally removed by the initial codex commit. Sandbox-bypass
flags on both agy and codex are now opt-out via env vars
(GNOMA_AGY_BYPASS_PERMISSIONS, GNOMA_CODEX_BYPASS_SANDBOX).

Includes review-driven fixes:
- ADC fallback now uses real DetectOptions (cloud-platform scope)
- fileTokenProvider returns an error on expired tokens instead
  of shipping a known-dead bearer
- TestNew_Precedence asserts which credential was actually picked
- codex parser tolerates non-JSON banner / debug lines on stdout
- codex usage takes max(input_tokens, prompt_tokens) so accounting
  can't silently undercount

No conflicts expected with the dev image-content feature: the
worktree branch only touches the google and subprocess provider
families.
2026-05-22 12:15:32 +02:00
vikingowl afc31b0af4 fix(subprocess): restore agy alongside codex; env-gate sandbox bypass
The original commit on this branch replaced the agy subprocess agent
with codex (overwriting the slot in knownAgents, deleting agy_test.go
and the agyParser). That was unintentional — agy (antigravity) is a
distinct CLI from codex (OpenAI's). Antigravity will replace gemini
when gemini retires on 2026-06-16, so it needs to keep its own slot.

Restored: FormatAgyText constant, agyParser with newAgyParser and
the line-delimited text parser, the agy CLIAgent entry in
knownAgents with PromptResponseFormat:true, agy_test.go, and the
agy case in newParser. Sourced from the parent commit so behavior
matches what shipped before the codex change.

Sandbox bypass: both agy (--dangerously-skip-permissions) and codex
(--dangerously-bypass-approvals-and-sandbox) need a flag to run
non-interactively (their stdin is closed; without it they block on
approval prompts nobody can answer). Both default to ON for
out-of-box behavior; operators with pre-approved trust config can
opt out via GNOMA_AGY_BYPASS_PERMISSIONS=0 or
GNOMA_CODEX_BYPASS_SANDBOX=0. Tests cover the on / opt-out / unknown
value branches.

TestKnownAgents_ValidFormats updated to accept the restored
FormatAgyText.
2026-05-22 12:14:54 +02:00
vikingowl 1717f9f567 fix(subprocess/codex): tolerate non-JSON stdout, max-of-token-paths
Codex emits banner / debug / "starting turn" lines to stdout
interleaved with the JSON event stream. The parser previously
returned an error on any line that wasn't a JSON object, which
subprocessStream.Next treats as terminal — one stray banner
aborted the whole turn. Skip lines that don't start with `{`
after whitespace trim, and downgrade unparseable JSON-looking
lines to a slog.Debug so they don't kill the stream either.

Token accounting: usage payloads from newer codex builds
occasionally carry both input_tokens and prompt_tokens (and
likewise output / completion) with slightly different values.
Always use the larger of the two so we can't silently undercount.

Tests cover non-JSON banner skipping, malformed-JSON
non-fatal-skip, and the max() behavior with both token
fields populated.
2026-05-22 12:08:32 +02:00
vikingowl f83ace7ad6 fix(google): real ADC scopes, expired-token rejection, error reporting
credentials.DetectDefault(nil) always returns "options must be
provided", which made the ADC branch unreachable. Pass an explicit
DetectOptions with the cloud-platform scope so users with
GOOGLE_APPLICATION_CREDENTIALS or `gcloud auth application-default
login` actually flow through ADC instead of falling out as
"no credentials found".

fileTokenProvider.Token used to return expired tokens unchanged.
We don't perform an OAuth refresh exchange (the upstream CLI does
that out-of-band into the file we read), so when the file isn't
fresh the only safe move is to fail loudly with an actionable
message rather than ship a known-dead bearer that genai forwards
to Vertex AI and gets back a confusing 401.

tryLoadOAuthCredentials previously swallowed all errors equally,
so the precedence walker silently skipped past misconfigured files
(chmod 0600 on the wrong user, half-written JSON, etc.). Now
os.IsNotExist is silent (normal walking), everything else gets a
slog.Warn with the path so an unreadable file is visible.

selectOAuthCredentials extracts the precedence chain into a
testable helper that also returns a CredentialSource tag
identifying which path was chosen. The previous precedence test
only asserted err == nil; the new test verifies that the agy file
wins when both are present and that the fallback to gemini
actually loads the gemini token.
2026-05-22 12:08:22 +02:00
vikingowl c5cc98ed8a feat(provider/openai): translate user image content to image_url parts
When the user message has at least one ImageContent block, build a
ChatCompletionContentPartUnionParam array with text + image_url
parts instead of the string content path. Image bytes are inlined
as a base64 data URL (data:<media-type>;base64,...). Adjacent text
blocks are merged into a single TextContentPart. Pure-text user
messages stay on the existing string fast path.

This covers OpenAI direct + every openaicompat backend (Ollama,
llama.cpp, llamafile) since they all share the same provider.

Tests: pure text uses OfString; image present emits 2 content parts
(text + image_url with the expected base64 payload); nil-Image
blocks are dropped and adjacent text merges correctly.
2026-05-22 11:50:55 +02:00
vikingowl 12a6b83cc9 feat: implement Google auth precedence and Codex integration 2026-05-22 00:21:32 +02:00
vikingowl 99fa0ff08e refactor(providers): refresh defaults to current 2026 model lineup
Bump hard-coded provider defaults to the May 2026 lineup:

- Anthropic: claude-sonnet-4-6 (default); Opus 4.7 and Haiku 4.5 in
  the fallback list. 4.6/4.7 generation has 1M context standard.
- OpenAI: gpt-5.5 (default); 5.5-pro / 5.2 / 5.2-chat-latest in
  fallback. ThinkingModes now baseline on GPT-5.x.
- Google: gemini-3.5-flash (default); 3.1 Pro / Flash Lite in fallback.
- Mistral: mistral-large-latest unchanged (Mistral Large 3); add
  mistral-medium-3.5, mistral-medium-2511, mistral-large-2512 to the
  rate-limit map.

Legacy dated IDs retained in fallback lists and ratelimits maps so
configs pinned to claude-sonnet-4-20250514 / gpt-4o / gemini-2.5-flash
keep resolving. Capability tables (ContextWindow, MaxOutput,
ThinkingModes) updated to match each generation. CLI help text in
cmd/gnoma/main.go also updated.
2026-05-20 03:13:21 +02:00
vikingowl c4fde583f5 chore(lint): gofmt sweep + errcheck cleanups in router discovery
Apply gofmt -w across the codebase (struct field comment realignment
only — no semantic changes) and silence two errcheck warnings on
fmt.Sscanf / fmt.Fprintf return values in internal/router/discovery
with explicit `_, _ =` discards. Required so `make check` is green
before tagging v0.1.0.
2026-05-20 03:13:05 +02:00
vikingowl c8813768d5 fix(subprocess): harden agy CLI integration
- Drop unverified JSONOutput/Vision capability claims on agy (no native
  stream-json, no image-input path on v1.0.0).
- Replace agent.Name == "agy" check with PromptResponseFormat flag on
  CLIAgent so the prompt-augmented JSON fallback scales to future agents.
- Pass --dangerously-skip-permissions in agy PromptArgs to parallel
  gemini --yolo / vibe --trust; required for non-interactive runs.
- Nil-guard JSONSchema and Schema bytes in buildPrompt (previously
  panicked when ResponseJSON was requested without a schema).
- Rename misleading TestAgyProvider_StreamAugmentation to
  TestAgyParser_EmitsLineDeltas; add coverage for nil-schema path and
  non-augmenting agents.
2026-05-20 01:29:05 +02:00
vikingowl 3c875276c9 feat(security): implement multi-wave audit remediation and agy provider support
Implemented full security remediation following Universal Security Pilot protocol:
- W1: Enforced SecureProvider at router and engine boundaries to prevent bypasses.
- W1: Implemented path-sensitive policy for MCP tools.
- W2: Added SHA256 hash verification for SLM downloads (llamafile).
- W3: Enhanced secret redaction for private keys (full body) and high-entropy strings.
- W4: Fixed symlink-based filesystem sandbox escapes in paths and grep.
- W4: Documented CLI agent trust boundaries.

Also added 'agy' (Antigravity) as a subprocess CLI provider with plain-text JSON schema support.
2026-05-20 01:13:13 +02:00
vikingowl 17d83f2e2a feat: add agy CLI provider and support structured output via prompt augmentation 2026-05-20 00:21:03 +02:00
vikingowl b331dcd61a feat(subprocess): per-agent binary override via [cli_agents] config
Plan B from docs/superpowers/plans/2026-05-19-post-slm-unlock.md.

Users with aliased CLI binaries (claude-priv, claude-work,
gemini-personal) can now point gnoma's auto-discovery at them
without renaming. The override flows through to the actual subprocess
spawn at internal/provider/subprocess/provider.go:56, so routing
through the alias is functional, not cosmetic.

Config:
  [cli_agents]
  claude = "claude-priv"   # discovery uses claude-priv instead of claude
  gemini = ""              # empty value = no override (fall back to canonical)
  # vibe is absent = canonical name used

- internal/config/config.go: CLIAgentsSection map[string]string;
  TOML [cli_agents] key.
- internal/provider/subprocess/agent.go:
  - Package-level lookPath = exec.LookPath for test injection.
  - resolveAgentBinary(canonical, override) → (path, binName, err).
    Override='' falls back to canonical. Override set but missing from
    PATH returns an error (no silent fallback — masks user typos).
  - DiscoveredAgent.OverrideBinary records the override binary name
    when one was used; empty otherwise.
  - DiscoverCLIAgents(ctx, overrides) signature; warning logged when
    an override is configured but the binary isn't on PATH.
- cmd/gnoma/main.go: both call sites pass cfg.CLIAgents. The
  `gnoma providers` listing renders `claude-priv (via [cli_agents].claude)`
  when an override is in effect.

Tests cover: 5 resolver cases (no override, override set, empty
override falls back, override missing, canonical missing); 4
discovery cases (no overrides, override resolves alias, empty value
falls back, override missing skips agent); 2 config round-trip cases.
2026-05-19 21:02:16 +02:00
vikingowl 9388479b03 feat(openai): lexical repair for malformed tool-call arguments
Local-model servers (Ollama, llama.cpp, llamafile) routed through the
OpenAI-compatible path frequently emit tool-call arguments that are
*almost* valid JSON — wrapped in markdown fences, padded with prose, or
trailing a stray comma. Strict parsing fails, the engine receives empty
args, and the agent loop has to retry or escalate.

Adds repairArgs(raw) at the EventToolCallDone boundary: strict-parse
first, then apply cheap lexical fixes (strip ```json fences, drop
trailing commas before }/], extract the first balanced {...} block with
proper string/escape awareness). On success, the repaired bytes flow
through unchanged; on failure, the original is returned and downstream
parsing surfaces the error as before.

Frontier providers (OpenAI proper, Anthropic, Mistral, Google) are
unaffected — their SDKs return structured args that pass strict parse.
The repair only does work when the upstream output is malformed.

11 unit tests cover: valid passthrough, empty, trailing commas,
single/double-line fences, prose-wrapped, braces-inside-strings,
multiple top-level objects (takes the first), and unrepairable input.
A stream-level test verifies the wiring through flushNextToolCall.
2026-05-19 17:59:05 +02:00
vikingowl ec9433d783 chore(lint): clear remaining errcheck and staticcheck findings
Brings the project to a clean `make lint` baseline (0 issues).

Mechanical:
- Wrap deferred resp.Body.Close() in closures (router/discovery.go,
  router/probe.go) so the unchecked return surfaces as `_ = ...`.
- Apply `_ = ...` (single or multi-return blank) to test-file calls
  that intentionally ignore errors: os.MkdirAll / os.WriteFile / os.Chdir
  in setup paths, Close / Shutdown in teardown, Submit / Spawn / Send /
  LoadDir in tests that assert on side effects.

Structural:
- engine.handleRequestTooLarge drops the unused req parameter and
  rebuilds the request from compacted history (SA4009 — argument was
  overwritten before first use).
- provider.ClassifyHTTPStatus and google.applyCapabilityOverrides switch
  to tagged switches over the discriminator (QF1002).
- tui.app.go MouseWheel + inputMode and cmd/gnoma main slm-status use
  tagged switches in place of equality chains (QF1003).
- cmd/gnoma main.go merges a var decl with its immediate assignment
  (S1021).
- Three empty-branch sites (dispatcher_test, loader_test,
  coordinator_test) become real assertions or get the dead `if` removed
  (SA9003).
2026-05-19 17:53:42 +02:00
vikingowl 13b2f5e14d chore(lint): clear dead code and tighten lifecycle errcheck
Removes five unused funcs/vars/fields that golangci-lint had been
flagging (anthropic.toolCallDoneEvent, mistral.translateMessages,
hook.newError, subprocess.vibeParser.lastAssistantMsgID, tui.cBase),
two ineffectual assignments (tui/rendering.go visible-window loop,
subprocess stream_test setup), and a stale if/HasPrefix that's now a
strings.TrimPrefix.

Wires errcheck onto every subprocess / stream lifecycle path so a
failed close or shutdown is at least logged rather than silently
dropped:

- engine/loop.go: stream.Close on both the error and success paths
- mcp/manager.go: Shutdown when StartAll partial-fails; Transport
  close after Initialize failure
- mcp/transport.go: stdin.Close + syscall.Kill on graceful-timeout
  fallback
- slm/download.go: Close propagated as a named-return error on the
  success path; explicitly discarded on the rollback path
- slm/classifier.go, slm/manager.go, hook/prompt.go, context/summarize.go,
  config/write.go, cmd/gnoma/main.go, tool/fs/grep.go: explicit
  ignores or error logging on Close / Shutdown / WalkDir / Scanln

Production-code errcheck and ineffassign are now zero. Remaining
golangci-lint output is test-only Close-in-defer noise plus
stylistic staticcheck QF suggestions, left alone.
2026-05-19 17:05:54 +02:00
vikingowl 0d2d825e52 feat: add dynamic model discovery within providers
- OpenAI provider: use Models.ListAutoPaging() to discover available models
- Anthropic provider: use Models.ListAutoPaging() to discover available models
- Google provider: use Models.All() iterator to discover available models
- All providers fall back to hardcoded lists if API calls fail
- Add capability inference functions for each provider based on model ID
- Add tests for model discovery fallback behavior

This enables gnoma to dynamically discover new models as they become available
from cloud providers, while maintaining backward compatibility with fallback
lists for offline use or API failures.
2026-05-07 22:27:24 +02:00
vikingowl a9213ec382 feat(slm): Wave C — SLM classifier, MaxComplexity routing, CLI subcommands, TUI status
- slm.Classifier: openaicompat → llamafile, 2s timeout + heuristic fallback,
  heuristic baseline blended so Priority/RequiredEffort are never zeroed,
  extractJSON strips markdown fences from small-model responses
- router.ParseTaskType: case-insensitive string → TaskType, unknown → TaskGeneration
- router.Arm.MaxComplexity: zero = no ceiling (preserves existing arm behavior);
  filterFeasible excludes arms when task.ComplexityScore > MaxComplexity
- config.SLMSection: [slm] enabled / model_url / data_dir
- openaicompat.NewLlamafile: no API key, model = "default", no retries
- slm.Manager: DefaultDataDir() (XDG), Manifest() accessor
- cmd/gnoma: `gnoma slm setup` / `gnoma slm status` subcommands; SLM arm
  registered with MaxComplexity=0.3 when enabled + set up
- tui: /config shows slm status (ready/missing/not set up + base URL if running)
- docs: roadmap updated to reflect llamafile pivot from Ollama
2026-05-07 16:44:32 +02:00
vikingowl 44d0bdc032 feat(provider): subprocess CLI provider for claude, gemini, vibe
Adds internal/provider/subprocess — a provider.Provider that spawns CLI
agents (claude, gemini, vibe) as subprocesses and streams their output.

- FormatParser interface + three parsers for claude-stream-json,
  gemini-stream-json, and vibe-streaming formats; fixtures captured from
  real binaries
- subprocessStream: pull-based stream.Stream over subprocess stdout with
  bounded stderr capture (8KB) and guarded reap() to prevent double-Wait
- DiscoverCLIAgents: parallel PATH scan with 10s timeout, stable ordering
- Provider: only the last user message is passed as --prompt; all other
  request fields (history, tools, system prompt) are intentionally ignored
  (see package doc)
- main.go: discover and register CLI arms at startup; TODO(P0c) for
  tier-based routing to enforce preference order explicitly
2026-05-07 14:29:34 +02:00
vikingowl 7fbb5454ee feat(router): normalize effort/thinking abstraction across providers
Add EffortLevel (auto/low/medium/high) as a provider-agnostic reasoning
control, replacing the Capabilities.Thinking bool. Each provider maps
the level to its native parameter: Anthropic budget tokens (1K/8K/16K),
OpenAI reasoning_effort (low/medium/high), Google thinking budget
(1K/8K/16K). Task classification auto-infers effort from TaskType and
complexity; filterFeasible excludes arms that lack the required level.
2026-05-07 14:08:50 +02:00
vikingowl d71bd942c4 feat: local model reliability — SDK retries, capability probing, init skill, context compaction
Three compounding bugs prevented tool calling with llama.cpp:
- Stream parser set argsComplete on partial JSON (e.g. "{"), dropping
  subsequent argument deltas — fix: use json.Valid to detect completeness
- Missing tool_choice default — llama.cpp needs explicit "auto" to
  activate its GBNF grammar constraint; now set when tools are present
- Tool names in history used internal format (fs.ls) while definitions
  used API format (fs_ls) — now re-sanitized in translateMessage

Additional changes:
- Disable SDK retries for local providers (500s are deterministic)
- Dynamic capability probing via /props (llama.cpp) and /api/show
  (Ollama), replacing hardcoded model prefix list
- Engine respects forced arm ToolUse capability when router is active
- Bundled /init skill with Go template blocks, context-aware for local
  vs cloud models, deduplication rules against CLAUDE.md
- Tool result compaction for local models — previous round results
  replaced with size markers to stay within small context windows
- Text-only fallback when tool-parse errors occur on local models
- "text-only" TUI indicator when model lacks tool support
- Session ResetError for retry after stream failures
- AllowedTools per-turn filtering in engine buildRequest
2026-04-13 02:01:01 +02:00
vikingowl 2093beea58 fix: deterministic 500 retry, OpenAI error wrapping, local /init prompt
Stop retrying llama.cpp 500s that are deterministic tool-parse failures
by inspecting the error message body (ClassifyHTTPError). Wrap OpenAI SDK
errors as ProviderError so the engine's retry logic classifies them. Add
localInitPrompt for local models that uses sequential fs_* calls instead
of spawn_elfs (which local models can't produce reliably).
2026-04-12 18:35:18 +02:00
vikingowl 4f1e0cf567 feat: Ollama/gemma4 compat — /init flow, stream filter, safety fixes
provider/openai:
- Fix doubled tool call args (argsComplete flag): Ollama sends complete
  args in the first streaming chunk then repeats them as delta, causing
  doubled JSON and 400 errors in elfs
- Handle fs: prefix (gemma4 uses fs:grep instead of fs.grep)
- Add Reasoning field support for Ollama thinking output

cmd/gnoma:
- Early TTY detection so logger is created with correct destination
  before any component gets a reference to it (fixes slog WARN bleed
  into TUI textarea)

permission:
- Exempt spawn_elfs and agent tools from safety scanner: elf prompt
  text may legitimately mention .env/.ssh/credentials patterns and
  should not be blocked

tui/app:
- /init retry chain: no-tool-calls → spawn_elfs nudge → write nudge
  (ask for plain text output) → TUI fallback write from streamBuf
- looksLikeAgentsMD + extractMarkdownDoc: validate and clean fallback
  content before writing (reject refusals, strip narrative preambles)
- Collapse thinking output to 3 lines; ctrl+o to expand (live stream
  and committed messages)
- Stream-level filter for model pseudo-tool-call blocks: suppresses
  <<tool_code>>...</tool_code>> and <<function_call>>...<tool_call|>
  from entering streamBuf across chunk boundaries
- sanitizeAssistantText regex covers both block formats
- Reset streamFilterClose at every turn start
2026-04-05 19:24:51 +02:00
vikingowl e1a47a7620 feat: rate limit pools, elf tree view, permission prompts, dep updates
Rate limits:
- Add PoolRPS/PoolTPM/PoolTokensMonth/PoolCostMonth pool kinds
- Provider defaults for Mistral/Anthropic/OpenAI/Google (tier-aware)
- Config override via [rate_limits.<provider>] TOML section
- Pools auto-attached to arms on registration

Elf tree view (CC-style):
- Structured elf.Progress type replaces flat string channel
- Tree with ├─/└─ branches, per-elf stats (tool uses, tokens)
- Live activity updates: tool calls, "generating… (N chars)"
- Completed elfs stay in tree with "Done (duration)" until turn ends
- Suppress raw elf output from chat (tree + LLM summary instead)
- Remove background elf mode (wait: false) — always wait
- Truncate elf results to 2000 chars for parent context
- Parallel hint in system prompt and tool description

Permission prompts:
- Show actual command in prompt: "bash wants to execute: find . -name '*.go'"
- Compact hint in separator bar: "⚠ bash: find . | wc -l [y/n]"
- PermReqMsg carries tool name + args

Other:
- Fix /model not updating status bar (session.Local.SetModel)
- Add make targets: run, check, install
- Update deps: BurntSushi/toml v1.6.0, chroma v2.23.1, x/text v0.35.0, cloud.google.com/go v0.123.0
2026-04-03 20:54:48 +02:00
vikingowl 9608436b52 feat: add OpenAI-compat adapter for Ollama and llama.cpp
Thin wrapper over OpenAI adapter with custom base URLs.
Ollama: localhost:11434/v1, llama.cpp: localhost:8080/v1.
No API key required for local providers.

Fixed: initial tool call args captured on first chunk
(Ollama sends complete args in one chunk, not as deltas).

Live verified: text + tool calling with qwen3:14b on Ollama.
Five providers now live: Mistral, Anthropic, OpenAI, Google, Ollama.
2026-04-03 13:47:30 +02:00
vikingowl dccb5fe65a feat: add Google GenAI provider adapter
Streaming via goroutine+channel bridge (range-based iter.Seq2 → pull
iterator). Tool use with FunctionCall/FunctionResponse, tool name
sanitization, tool name map for FunctionResponse correlation.
Stop reason override (Google uses STOP for function calls).

Hardcoded model list (gemini-2.5-pro/flash, gemini-2.0-flash).
Wired into CLI with GOOGLE_API_KEY + GEMINI_API_KEY env support.

Live verified: text streaming + tool calling with gemini-2.5-flash.
Four providers now live: Mistral, Anthropic, OpenAI, Google.
2026-04-03 13:42:29 +02:00
vikingowl 261c19f90f feat: add OpenAI provider adapter
Streaming, tool use (index-based delta accumulation), tool name
sanitization (fs.read → fs_read), StreamOptions.IncludeUsage for
token tracking. Hardcoded model list (gpt-4o, gpt-4o-mini, o3, o3-mini).

Wired into CLI with OPENAI_API_KEY env support.
Live verified: text streaming + tool calling with gpt-4o.
2026-04-03 13:33:55 +02:00
vikingowl 9e7caf2467 feat: add Anthropic provider adapter
Streaming, tool use (with InputJSONDelta assembly), thinking blocks,
cache token tracking, system prompt separation. Tool name sanitization
(fs.read → fs_read) for Anthropic's naming constraints with reverse
translation on tool call responses.

Hardcoded model list with capabilities (Opus 4, Sonnet 4, Haiku 4.5).
Wired into CLI with ANTHROPIC_API_KEY + ANTHROPICS_API_KEY env support.

Also: migrated Mistral SDK to github.com/VikingOwl91/mistral-go-sdk.

Live verified: text streaming + tool calling with claude-sonnet-4.
126 tests across 9 packages.
2026-04-03 13:11:00 +02:00
vikingowl c54471a37b refactor: migrate mistral sdk to github.com/VikingOwl91/mistral-go-sdk
Same package, new GitHub deployment with fixed tests.
somegit.dev/vikingowl → github.com/VikingOwl91, v1.2.0 → v1.2.1
2026-04-03 12:06:59 +02:00
vikingowl 69f5dba091 feat: complete M1 — core engine with Mistral provider
Mistral provider adapter with streaming, tool calls (single-chunk
pattern), stop reason inference, model listing, capabilities, and
JSON output support.

Tool system: bash (7 security checks, shell alias harvesting for
bash/zsh/fish), file ops (read, write, edit, glob, grep, ls).
Alias harvesting collects 300+ aliases from user's shell config.

Engine agentic loop: stream → tool execution → re-query → until
done. Tool gating on model capabilities. Max turns safety limit.

CLI pipe mode: echo "prompt" | gnoma streams response to stdout.
Flags: --provider, --model, --system, --api-key, --max-turns,
--verbose, --version.

Provider interface expanded: Models(), DefaultModel(), Capabilities
(ToolUse, JSONOutput, Vision, Thinking, ContextWindow, MaxOutput),
ResponseFormat with JSON schema support.

Live verified: text streaming + tool calling with devstral-small.
117 tests across 8 packages, 10MB binary.
2026-04-03 12:01:55 +02:00
vikingowl 788bd8ec24 feat: add foundation types, streaming, and provider interface
internal/message/ — Content discriminated union, Message, Usage,
StopReason, Response. 22 tests.

internal/stream/ — Stream pull-based iterator interface, Event types,
Accumulator (assembles Response from events). 8 tests.

internal/provider/ — Provider interface, Request, ToolDefinition,
Registry with factory pattern, ProviderError with HTTP status
classification. errors.AsType[E] for Go 1.26. 13 tests.

43 tests total, all passing.
2026-04-03 10:57:54 +02:00