gnoma

Author	SHA1	Message	Date
vikingowl	352cab4a94	docs(todo): extend config-migration plan with project registry Release / release (push) Has been cancelled Details Adds item #5 to the config write/merge corruption entry: ~/.config/gnoma/projects.json tracking which directories gnoma has been launched in. Enables doctor --all-projects, cross-project session listing, and one-shot upgrade-config across all known projects. Documents the design constraints: must use the same omitempty / atomic-write discipline as the encoder fix to avoid recreating the class of bug it exists to help solve. Privacy footprint flagged (local-only directory log; opt-out toggle). Stale-entry handling gated through doctor, not auto-prune. v0.3.2	2026-05-24 22:29:56 +02:00
vikingowl	58f4001917	docs(todo): track config write/merge corruption + doctor/upgrade design setConfig() serializes the entire Config struct on every key change, which writes zero-valued fields into the file. On the next load those explicit zeros override higher-priority layers via toml.Decode's present-beats-absent semantics. Concrete symptom today: a global prefer = 'cloud' was silently shadowed by a project prefer = ''. Captures the multi-part fix surface so it doesn't get half-done: - Stop generating zero-spam (omitempty hybrid or pelletier swap). - gnoma doctor: read-only diagnostic (zero-spam, invalid enums, removed keys, effective-merged values). - gnoma upgrade-config: active migration with .bak backup + diff. - Auto-migrate project-level on startup with TUI banner notice; global stays explicit.	2026-05-24 22:24:59 +02:00
vikingowl	6c5e969217	feat(tui): add /router command for runtime routing-preference switch Mirrors the pattern of /permission: bare command shows the current value plus a help line; with an argument (auto/local/cloud) it calls Router.SetPreferPolicy and emits a system message. Session-only — does not write back to config.toml, matching /permission and Ctrl+X incognito-toggle conventions. Tab completion on the value via routerPreferModes alongside the existing permissionModes pattern. Help text updated. Status-bar indicator deferred (separate concern if it turns out to be wanted).	2026-05-24 22:13:27 +02:00
vikingowl	74bd570438	fix(tui): de-dupe /init in command picker; skill names shadow builtins /init appeared twice in the completion picker — once from the static builtinCommands list and once from the bundled init skill at internal/skill/skills/init.md (registered via skills.All()). Two changes: - Remove /init from builtinCommands. The skill provides the canonical entry, and its description ('Generate or update AGENTS.md project documentation') is more accurate than the static one ('initialize project — create AGENTS.md') because the skill handles both create and update. - Refactor completionSource() so a skill name silently shadows any builtin with the same name. Prevents this from recurring if a future builtin migrates to a skill, and lets users override a builtin's description by dropping a skill of the same name into .gnoma/skills/.	2026-05-24 22:08:46 +02:00
vikingowl	d38d7daf25	fix(subprocess/agy): disable ToolUse until stream-json lands agy is registered with FormatAgyText and the agyParser emits every stdout line as a plain EventTextDelta. There is no path for a structured ToolCall event to come back. With ToolUse=true the router would dispatch tool-needing tasks (security_review, spawn_elfs, file edit) to agy; the underlying Gemini model would describe calling the tool in prose — invented UUIDs and 'I will pause now'-style stubs — the engine would receive only text, and the turn would hang waiting for a tool call that never arrives. Surfaced when /init routed to agy for a security_review task and elf spawning visibly hallucinated in the TUI. Capability flag flipped to false; agy stays usable for tool-free prompts (explain, summarize, simple chat). TODO entry for native stream-json updated to flag that the capability flip is part of that same change.	2026-05-24 21:58:22 +02:00
vikingowl	06d4069076	ci: pin GoReleaser to the triggering tag, fix tag-collision regression Release / release (push) Has been cancelled Details When v0.3.1 was tagged on the same commit as v0.3.1-rc2, the release workflow built and tried to publish rc2 artifacts instead of v0.3.1, failing with 'already_exists' on every asset upload. Root cause: goreleaser-action@v6 + 'version: latest' (locked to v2.x) falls back to 'git describe --tags' for the current tag, which picked v0.3.1-rc2 over v0.3.1 when both refs pointed at HEAD. Explicitly setting GORELEASER_CURRENT_TAG = github.ref_name forces the workflow to use the tag that triggered it, regardless of other refs at the same commit. v0.3.1	2026-05-24 17:36:01 +02:00
vikingowl	f641bd4971	docs(todo): track bandit selector design questions Two related items surfaced from the r/coolgithubprojects v0.3.1 launch thread. Bundled because they share the selector code: 1. Whether to keep numeric EMA at all post-SLM dispatcher (open strategic question from the 2026-05-07 roadmap — not a must-implement). 2. Surfacing hardcoded selector knobs (qualityAlpha, blend ratio, strength bonus, quality floor) as [router.bandit] config keys — ships independently of #1.	2026-05-24 17:34:13 +02:00
vikingowl	798f2ab3c3	fix(release): prerelease auto-detect; changelog excludes scoped conventional commits Release / release (push) Has been cancelled Details Two polish issues surfaced by the v0.3.1-rc1 pipeline test: - The release was tagged v0.3.1-rc1 but published without the prerelease flag, so it appeared alongside stable releases. Add 'prerelease: auto' to release.github so GoReleaser marks any tag with a semver prerelease suffix (-rc, -beta, -alpha, -pre) appropriately. - The changelog filters used '^docs:' patterns that only match bare conventional commits. Scoped variants like 'docs(readme):' and 'chore(make):' slipped through into the published changelog. Switch to '^docs[:(]' style patterns to match both forms, and add '^style[:(]' so gofmt-drift commits are excluded too. v0.3.1-rc2	2026-05-24 17:05:49 +02:00
vikingowl	9814795b3c	ci: migrate release pipeline from Woodpecker to GitHub Actions Release / release (push) Has been cancelled Details Drop the broken .woodpecker/release.yml (top-level when: triggered an 'error' status on every dev push instead of skipping non-tag events) and replace with .github/workflows/release.yml driving the same GoReleaser flow. Rationale: - Release artifacts already land on GitHub (releases + ghcr.io), so running the pipeline on GitHub eliminates a build hop. - GH Actions auto-provides GITHUB_TOKEN with packages:write via the workflow permissions block — no PAT plumbing or login secrets. - docker/setup-qemu-action and docker/setup-buildx-action handle the multi-arch cross-build setup that Woodpecker would require manual host configuration for. Trigger: any tag matching refs/tags/v*. Mirror sync from somegit.dev propagates tags to GitHub, so 'git push origin v0.3.1' on the canonical remote still drives the GitHub-side release. v0.3.1-rc1	2026-05-24 16:45:17 +02:00
vikingowl	047924da2b	ci(woodpecker): release pipeline on vX.Y.Z tag Runs 'go test ./...' then 'goreleaser release --clean' inside the official goreleaser image when a tag matching refs/tags/v* is pushed. GITHUB_TOKEN comes from the 'github_token' repo secret (needs repo + write:packages scopes) and is reused for ghcr.io docker login so the multi-arch image build can push. Runner requirements documented inline: docker socket access plus QEMU registered on the host (tonistiigi/binfmt --install all) for arm64 cross-builds. Directory form chosen so a non-release CI pipeline can land later under .woodpecker/ci.yml without restructuring.	2026-05-24 16:38:24 +02:00
vikingowl	a23eb6b92c	style: gofmt drift from prior commits Pure whitespace cleanup surfaced when 'make check' ran gofmt over the tree. Mostly struct-field column alignment in internal/safety/banner.go (SessionInfo) and the var(...) flag block in cmd/gnoma/main.go after --dangerously-allow-anywhere was added without realignment. Verified zero substantive changes via 'git diff --ignore-all-space --ignore-blank-lines'.	2026-05-24 16:33:17 +02:00
vikingowl	0981fb82d6	chore(make): add govulncheck and semgrep to 'make check' Both checks already passed locally on the current dev tip; wiring them into the canonical pre-commit gate so security regressions fail fast instead of leaking into a release. - 'make vuln' runs govulncheck with reachability analysis against the Go vuln DB. - 'make sec' runs semgrep with p/golang + p/security-audit, metrics off, --error so findings exit non-zero. Tools must be installed locally (commands in Makefile comments). If upstream Woodpecker CI runs 'make check', it will need both binaries on the runner image.	2026-05-24 16:30:54 +02:00
vikingowl	3888966e68	fix(deps): bump golang.org/x/net to v0.55.0 to clear reachable CVEs govulncheck flagged two reachable vulnerabilities in golang.org/x/net@v0.52.0: - GO-2026-5026 (idna fails to reject ASCII-only Punycode labels), reached via router.DiscoverOllama -> http.Client.Do -> idna.ToASCII. - GO-2026-4918 (HTTP/2 transport infinite loop on bad SETTINGS_MAX_FRAME_SIZE), same call path -> http2.Transport.*. Bumping to v0.55.0 covers both. Transitive bumps to x/crypto v0.51.0, x/sys v0.45.0, x/text v0.37.0. Post-bump govulncheck reports 0 reachable vulnerabilities and 0 in directly imported packages.	2026-05-24 16:27:28 +02:00
vikingowl	847cd5fe0c	fix(security): use crypto/rand for session-ID suffix Semgrep flagged math/rand for the /tmp artifact-directory session-ID generation. Modern Go (1.20+) auto-seeds the global math/rand source so this wasn't exploitable in practice, but crypto/rand is the idiomatic choice for any security-adjacent identifier and removes the finding from future security audits. Drops the mrand alias entirely; reads 8 random bytes once and masks to 24 bits to preserve the existing %06x suffix format.	2026-05-24 16:22:50 +02:00
vikingowl	001865f069	fix(env): correct ANTHROPIC_API_KEY typo, add missing vars The placeholder ANTHROPICS_API_KEY (with trailing S) silently failed: the auth layer reads ANTHROPIC_API_KEY, so anyone copying .env.example to .env and pasting their key would see gnoma never pick it up, with no clear error. Also surfaces vars that already work but weren't templated: GOOGLE_API_KEY (alternative to GEMINI_API_KEY), GNOMA_PROVIDER and GNOMA_MODEL (config overrides), and the two subprocess sandbox bypass footguns (GNOMA_AGY_BYPASS_PERMISSIONS, GNOMA_CODEX_BYPASS_SANDBOX), left commented out so they don't accidentally turn on.	2026-05-24 16:16:39 +02:00
vikingowl	c1c52f139d	docs(readme): add 'no phone-home' bullet and data-flow scope note Clarify that gnoma itself emits no telemetry to external services while being explicit that cloud-provider arms send data to those providers by design. Adds: - 'No phone-home' bullet to the differentiator list, naming the on-device path (Ollama/llama.cpp + --incognito). - 'Data flow' paragraph to the Security scope-note blockquote so the framing is consistent between the hero bullets and the Security section.	2026-05-24 16:00:40 +02:00
vikingowl	7040041f13	docs(readme): correct firewall scope; track egress controls in TODO The 'What makes gnoma different' bullet and Security section both implied a network-egress firewall. Today the Firewall only enforces a content boundary (secret scan, Unicode sanitize, redact/block). Reword both spots and add a Scope note. Surface the gap as a top-of-TODO entry covering per-session audit log and per-host egress allowlist, with the open design question (host-level vs per-tool) called out. Raised via r/SideProject v0.3.0 launch thread.	2026-05-24 15:50:35 +02:00
vikingowl	1828151162	docs(claude): big-picture architecture and expanded test commands Add a 'Big picture' section summarising the request flow (cmd → session → engine → router → security/permission → extensibility) so future Claude Code instances can orient without reading INDEX.md plus five package directories first. Note that internal/safety and internal/slm aren't in INDEX.md yet. Document the somegit.dev / GitHub mirror split and the ruleset that blocks force-push and deletion on main/dev. Expand build/test section with make check, make test-integration, single-test, and benchmark commands.	2026-05-24 15:39:23 +02:00
vikingowl	b5062d59e9	docs(readme): hero screenshot, differentiators, status, TOC Add docs/img/gnoma-tui.png as a hero image so visitors see the TUI above the fold instead of a wall of text. Pull the bandit router, prefer-policy, SLM, and built-in firewall out of buried sections into a 'What makes gnoma different' bullet list. Add a Status block flagging pre-1.0 and a table of contents. Move the pygmy-owl naming note and upstream/mirror URLs into a footer About section.	2026-05-24 15:39:14 +02:00
vikingowl	b13a6a2801	docs(plans): mark v0.3.0 plans shipped Three plans shipped end-to-end in v0.3.0; removing them from TODO.md In-flight and adding a Status: shipped header to each plan doc with the commit references. Shipped: - 2026-05-23-routing-defaults-refresh.md - 2026-05-23-prefer-routing-policy.md - 2026-05-23-startup-safety-banner.md Still in flight (telemetry-gated, fires only if measurements support it): - 2026-05-23-tool-router-specialization.md	2026-05-23 22:45:05 +02:00
vikingowl	8ba77c1685	fix(safety): env-template precision, label alignment, banner on bypass Three polish items surfaced during the maintainer's manual smoke of the previous safety commit. env-template precision (false-positive fix): The "env file" rule matched .env.* universally, which flagged conventional templates like .env.example / .env.sample / .env.template / .env.dist / .env.default — these hold variable NAMES, no values, and are commonly committed. Now skipped. Real env files (.env, .env.local, .env.production) still match. New envTemplateSuffixes table + isEnvTemplate helper; check runs only inside the env-file rule so the suffix denylist is scoped. Tests added for both directions: 6 templates that must NOT flag, 6 real env files that must. Banner label alignment: Field labels were padded to 8 chars except "sensitive" at 9, producing visible misalignment in the rendered banner: cwd : /... provider : ollama / ... sensitive : 0 matches in cwd <- one extra space Padded all labels to 9 chars so the ":" separators line up. Context banner on bypass: --dangerously-allow-anywhere previously suppressed the entire safety block, including the informational context banner. Bypassing the GATE is not the same as opting out of the info — the user still wants to see cwd / git state / sensitive files nearby. Restructured the safety block so classification + banner always run; the bypass only skips the refuse/warn FLOW. The bypass warning log now also includes the classified tier and cwd path for diagnostics. v0.3.0	2026-05-23 22:32:26 +02:00
vikingowl	c483656681	docs(plans): fix gnoma one-shot invocation in safety-banner plan gnoma takes the prompt as a positional argument, not via -p (that's Claude Code's syntax). Surfaced when the maintainer tried the manual smoke from the plan's "Definition of done" section and hit the "flag provided but not defined: -p" error. before: gnoma -p "test" after: gnoma "test" The same wrong syntax appears in the `f9094f6` / `3eeb5b4` commit messages but those are immutable. This commit also serves as the public record of the typo so future readers don't repeat it.	2026-05-23 22:26:56 +02:00
vikingowl	d206b3cf09	docs: routing-prefer + startup-safety user docs, plan tier-shift note README: - New "Preferring local vs cloud" subsection under "Routing defaults" — table of the three [router].prefer values, priority order against forced arm / incognito / Strengths, and the CLI-agent-counts-as-local clarification. - New "Startup safety check" subsection under "Security" — tier table, [safety] config block, --dangerously-allow-anywhere flag, container detection note, link to the plan doc. Plan doc (prefer-routing-policy): - Approach section updated to describe the tier-shift mechanism that actually shipped, with a clear "Implementation note" explaining why the original score-multiplier approach was abandoned (cost-floor math gives local arms a ~280x raw-score advantage that any reasonable multiplier can't overcome). - CLI-agent placement flipped from "non-local" to "local" with rationale — implementation chose user-facing behavior axis over the privacy axis the original draft used. - Tier-shift rationale table replacing the multiplier rationale. - P-3 task rewritten to reflect the actual implementation (checked off and pointing at the right code), with the policyMultiplier helper noted as a within-tier nudge of limited present effect. The implementation-vs-plan deviation is now documented in both the plan doc and the original feature commit message (`f9094f6`). Future readers reach the same understanding via either path.	2026-05-23 22:23:57 +02:00
vikingowl	3eeb5b46d7	feat(safety): pre-launch cwd classifier + context banner Implements S-1 through S-7 of the startup-safety-banner plan. Adds a pre-launch safety check that classifies the current working directory into three tiers and gates the launch: TierRefuse /, /etc, /sys, /proc, /usr, /var, /bin, /sbin, /boot, /root, /dev (Linux) and /System, /Library, /private, /Applications (macOS). Refuses with exit 2 unless --dangerously-allow-anywhere is passed. TierWarn $HOME, ~/Desktop, ~/Downloads, ~/Documents, ~/.config, ~/.local, ~/.cache, /tmp, and similar dumping grounds. Prints a banner and reads a single y/Y from stdin to confirm; any other input (or EOF, including piped/ scripted invocation) aborts with exit 1. TierOK Anywhere with a recognized project marker (.gnoma/, go.mod, package.json, pyproject.toml, Cargo.toml, Makefile, Dockerfile, build.gradle, pom.xml) or inside a git repo. No prompt; banner only. Project markers and git-repo presence override the TierWarn check — a project dir inside $HOME stays TierOK. The require_project_marker config knob can flip that for strict users. Container detection: when /.dockerenv or /run/.containerenv exists, TierRefuse downgrades to TierWarn (devcontainers often chroot to / or similar). Best-effort; false positives only soften the gate. The context banner is always rendered (TierOK, TierWarn, TierRefuse alike) and summarizes: cwd, git branch + dirty state, project type, provider/model, modes (permission, incognito, prefer), and a top-level sensitive-file inventory. Inventory matches .env, .env., env.local; private-key extensions (.pem, .key, .crt, .p12, .pfx); SSH key names (id_rsa, id_ed25519, ...); credentials files; .netrc / .pgpass; KeePass vaults; and .ssh/ .aws/ .kube/ .gcloud/ .azure/ .docker/ directories. Precision-tested: .envrc and secret_handler.go do NOT match. Bounded at 1000 entries. Architecture: - internal/safety/cwd.go — Classification + symlink-resolving tier classifier with platform-specific roots and container detection. - internal/safety/sensitive.go — pattern-based top-level scanner, deterministic ordering, scanLimit guard against pathological dirs. - internal/safety/banner.go — pure render functions for the warn prefix, refuse message, and context banner. Safe for golden-string testing. - internal/config/config.go — new [safety] section with three config keys, defaults applied via ResolvedSafety() helper. Pointer fields distinguish "user omitted" from "user set to false." - cmd/gnoma/main.go — gate runs after subcommand dispatch (so `gnoma providers / profile / slm / router` skip the prompt) and before provider creation. --dangerously-allow-anywhere bypasses the gate with an explicit log warning. The runtime keypress reads up to 8 bytes from os.Stdin and accepts only "y" / "Y" trimmed; EOF returns false (piped invocations without the flag will abort). Documented in the readYesConfirmation helper. Manual smoke (per plan): - `cd / && gnoma -p test` → refuses - `cd ~ && gnoma` → warns + keypress - `cd ~/git/some-repo && gnoma` → banner only - subcommands skip the gate entirely Linux + macOS classification; Windows path handling deferred per plan (treated as TierOK there until follow-up). Refs: docs/superpowers/plans/2026-05-23-startup-safety-banner.md	2026-05-23 22:19:39 +02:00
vikingowl	f9094f68f3	feat(router): [router].prefer = local \| cloud \| auto Implements P-1 through P-6 of the prefer-routing-policy plan. Adds a config knob that biases routing toward local arms, cloud arms, or leaves selection unchanged. Default "auto" is byte-identical to pre-change behavior (the new armTier path with PreferAuto returns the same value as the old single-arg function). Mechanism diverged from the plan after empirical testing: The plan called for a score multiplier applied in bestScored. Tests revealed the existing cost-floor math (scoreArm divides by weighted cost which collapses to ~0.001 for free local arms) gives local arms a ~280x raw-score advantage that a 0.3-0.5 multiplier can't overcome. A tier-shift in armTier turned out cleaner: PreferLocal: cloud arms (true API, IsLocal=false && !IsCLIAgent) get +2 tier shift, landing behind locals. PreferCloud: IsLocal arms get +2 tier shift, landing behind cloud. SLM tier-0 arms shift to tier 2 — still below cloud's tier 3 — so the SLM-protection semantic (small stuff stays on the small model) survives PreferCloud. This matches the open question in the plan, now resolved as: yes, SLMs keep winning under PreferCloud by design. The policyMultiplier was kept in bestScored as a within-tier nudge (mostly cosmetic in practice given the cost-floor dynamics described above; could matter when costs are calibrated). Worth revisiting once router-wide cost calibration lands. Strengths cross-tier promotion is unaffected: the promoted-set path in selectBest bypasses armTier entirely, so a strongly-tagged cloud arm still wins SecurityReview tasks under PreferLocal (validated by TestPreferPolicy_StrengthsBeatsMultiplier). CLI-agent subprocess arms count as "local" for PreferLocal purposes — they proxy to cloud but the user-visible behavior is local. Users who want to exclude them can use --provider X. Forced arms (--provider X) and incognito take priority over the policy: forced arm test pins this, incognito-still-wins test pins the LocalOnly hard filter dominating PreferCloud. Test coverage (prefer_test.go): ParsePreferPolicy / String round trips; policyMultiplier table; acceptance scenarios across all three policies with adjacent-tier arms; SLM-still-wins under PreferCloud; Strengths beats multiplier; forced-arm bypass; incognito beats prefer; lone cloud arm wins when no local feasible. Refs: docs/superpowers/plans/2026-05-23-prefer-routing-policy.md	2026-05-23 22:13:26 +02:00
vikingowl	162c8b1017	docs(plans): prefer-routing-policy and startup-safety-banner Two parallel pre-flight plans surfaced in the 2026-05-23 session, both deferred while the routing-defaults-refresh implementation landed. Drafted as separate plans because they're independent: the prefer-policy is a router scoring change; the safety banner is a launch-time check that never touches the router. prefer-routing-policy [router].prefer = "local" \| "cloud" \| "auto" — soft score multiplier (0.3 / 0.5 / 1.0) biasing toward local or cloud arms while preserving Strengths cross-tier promotion and bandit learning. Default "auto" is byte-identical to current behavior. Forced arms and incognito retain priority. CLI-agent subprocess arms count as non-local for this knob (they proxy to cloud). startup-safety-banner Three-tier cwd classification at launch — refuse in /etc /sys and other system roots; warn+keypress in $HOME, /tmp, ~/Desktop, ~/Downloads; OK inside any git repo or directory with a project marker (.gnoma/, go.mod, package.json, etc.). Always shows a context banner with cwd, git state, model, modes, and a top-level sensitive-file inventory (.env, id_rsa, *.pem, .ssh/, etc. — informational only, no recursion, capped at 1000 entries). Bypass via --dangerously-allow-anywhere. Complements the in-flight sensitive-content unified-policy TODO item: this is the pre-flight layer, that is the runtime input-path layer. Both plans default-on with safe defaults; both have explicit out-of-scope sections to prevent scope creep during implementation. Linux + macOS first; Windows path classification deferred. TODO.md surfaces both as in-flight.	2026-05-23 22:00:21 +02:00
vikingowl	c99b2c64ad	docs(readme): document routing defaults table and [[arms]] overrides Closes R-8 of the routing-defaults plan. Adds a new "Routing defaults" section between Config and SLM that documents what arms ship with out-of-the-box — the family-keyed Strengths / MaxComplexity / CostWeight matrix plus the non-chat exclude list. Also introduces the [[arms]] override block in the README for the first time (previously undocumented), showing how users keep priority over the defaults. Links back to the plan doc for the benchmark sources and per-entry rationale.	2026-05-23 21:42:05 +02:00
vikingowl	2f8d4c412f	feat(router): cloud-arm defaults, gpt-5.3-codex registration Closes R-4 and R-5 of the routing-defaults plan. R-4: Strengths + CostWeight defaults for closed frontier models. Cloud entries land in the same knownFamilyDefaults table as local ones, with MaxComplexity intentionally left zero (cloud arms get no complexity ceiling). CostWeight tuned per the plan's rationale: claude-opus-4-7 → Planning/SecurityReview/Debug/Refactor, 0.3 claude-sonnet-4-6 → Generation/Refactor/Review, 0.7 gpt-5.5 → Planning/SecurityReview/Generation, 0.3 gpt-5.3-codex → Generation/Refactor/Debug/UnitTest, 0.6 gpt-5.2 → Orchestration/Review, 0.8 gemini-3.1-pro → Planning/Review/Orchestration, 0.5 gemini-3.5-flash → Boilerplate/Explain/Orchestration, 1.2 The 0.3 weight on frontier arms keeps them competitive on SecurityReview / Planning despite $4+/Mtok; 1.2 on Gemini Flash penalizes cost more so it only wins when cost is genuinely decisive (boilerplate, explain). Mechanism: extracted applyFamilyDefaults into defaults.go and call it from Router.RegisterArm. Single source of truth — both local discovery and the primary-provider path in cmd/gnoma/main.go now flow through the same defaults application. Removed the duplicate apply block from RegisterDiscoveredModels. Legacy model IDs (claude-opus-4-20250514, gpt-4o, o3, gemini-2.5-pro, etc.) intentionally do not match any table entry — keeps users on pinned older models safe from imposed 2026 Strengths. R-5: gpt-5.3-codex registration. - internal/provider/openai/provider.go: added to fallbackModels and inferOpenAIModelCapabilities (400K context, 32K output). - internal/provider/ratelimits.go: gpt-5.3-codex and its dated alias gpt-5.3-codex-2026-02-15 added with the same Tier 1 quotas as gpt-5.2. Gemini 3.x (3.1-pro-preview, 3.5-flash, 3.1-flash-lite) was already registered in both google/provider.go and ratelimits.go — no change needed for that part of R-5. Test coverage: - ResolveFamilyDefaults table-driven across all 7 cloud entries including prefix-sharing (gpt-5.5-pro → gpt-5.5 defaults, gemini-3.1-pro-preview → gemini-3.1-pro defaults). - Legacy IDs return !ok. - RegisterArm applies cloud defaults end-to-end. - User-supplied Strengths and CostWeight are not overridden. - ID.Model() fallback works when ModelName is empty (test code often constructs arms this way). Refs: docs/superpowers/plans/2026-05-23-routing-defaults-refresh.md	2026-05-23 21:39:48 +02:00
vikingowl	9bb775a4aa	feat(router): full local family defaults table with size-keyed ceilings Expands the family-defaults scaffold to 23 entries covering the local models that currently appear in real Ollama fleets: coder specialists (qwen3-coder, devstral, qwen2.5-coder, yi-coder, deepseek-coder, starcoder), reasoners (phi-4, phi-4-mini), Gemma 2/3/4 (including the "edge" e2b/e4b variants under both Ollama and GGUF naming), Qwen 2.5/3/3.5 with a catch-all qwen entry, Mistral/Ministral (incl. the 24B mistral-small-3), Llama 3.2/4, tiny3.5 (reec's distill family), Granite, GLM (incl. glm-ocr specialist), and MiniCPM-V. Five families that span wide parameter ranges (qwen3.5, qwen3, qwen2.5, ministral-3, tiny3.5) now use SizeCap ladders instead of a flat MaxComplexity. A new parseSizeFromModelID helper splits the model ID on :/-_/ and matches pure <N>b/<N>m tokens, correctly ignoring qwen3.5 version strings, e2b edge tags, a3b MoE active params, and v0.3 version suffixes. ResolveMaxComplexity wraps ResolveFamilyDefaults plus the SizeCap traversal, falling back to the smallest cap when size parsing fails (conservative). Discovery's apply path now goes through it so SizeCap entries actually take effect. Test coverage: - parseSizeFromModelID (11 cases) - ResolveFamilyDefaults longest-prefix discipline (19 cases) - Unknown-family fallback returns !ok - ResolveMaxComplexity size-keyed ladder (13 cases) - Size-parse-failure fallback - knownFamilyDefaults invariants: SizeCaps ordered largest-first, SizeCaps and MaxComplexity mutually exclusive per entry - Routing-payoff integration: 3 arms (tiny3.5:1.5b, phi-4:14b, qwen3-coder:30b) get picked for TaskGeneration / TaskPlanning / TaskBoilerplate respectively, without any [[arms]] config - Local fleet visibility: the maintainer's actual `ollama ls` inventory registers correctly with expected MaxComplexity and Strengths; embeddinggemma stays filtered out The Planning sub-case surfaced a separate issue worth flagging: heuristicQuality floors out at 0.55 for a generic 14B local model without ThinkingModes, below TaskPlanning's 0.60 threshold. The test mutates phi-4's capabilities post-registration to reflect reality (phi-4 is reasoning-tuned). A discovery-side thinking-capability detection is out of scope for this plan but flagged in the test comment for follow-up. Refs: docs/superpowers/plans/2026-05-23-routing-defaults-refresh.md	2026-05-23 21:34:09 +02:00
vikingowl	a79e99199d	feat(router): non-chat exclude, vision prefixes, family-defaults scaffold Discovery previously registered every model returned by Ollama as a chat arm, including embeddings, ASR, TTS, audio realtime, and rerankers — which then failed at inference time when the router selected them. Local arms also shipped with all-zero defaults, so selection between e.g. tiny3.5:1.5b, phi-4:14b, and qwen3-coder:30b was effectively random. This change covers tasks R-1, R-2, R-6 from the routing-defaults plan. - nonChatModelPatterns + isNonChatModel substring matcher; matched IDs are skipped during RegisterDiscoveredModels. Covers whisper, moonshine, kokoros, vibevoice, -asr, -tts, -audio, -embedding, embeddinggemma, -reranker, lfm2. - knownVisionModelPrefixes gains gemma4, gemma-4, glm-ocr. gemma3 and minicpm-v entries stay for regression coverage. - New internal/router/defaults.go with FamilyDefaults struct, knownFamilyDefaults map, and ResolveFamilyDefaults longest-prefix lookup (with org/-namespace stripping so reecdev/tiny3.5:1.5b resolves to "tiny3.5"). Single entry for now: functiongemma is registered with Disabled=true and MaxComplexity=0.40, reserved for the future ArmRoleToolRouter path. Table will grow in R-3. - RegisterDiscoveredModels consults ResolveFamilyDefaults and only populates fields that are still zero on the arm, so user [[arms]] overrides keep priority. Plans: - docs/superpowers/plans/2026-05-23-routing-defaults-refresh.md - docs/superpowers/plans/2026-05-23-tool-router-specialization.md TODO.md surfaces both as in-flight items.	2026-05-23 21:24:59 +02:00
vikingowl	1606d19366	feat(subprocess/codex): account for cached and reasoning tokens codex 0.133.0 emits two token-accounting fields at top level that we previously dropped: cached_input_tokens — subset of input_tokens that hit the prompt cache (cheaper, but still counted in input_tokens per OpenAI Responses API semantics) reasoning_output_tokens — separately reported billable thinking tokens on reasoning-capable models Map cached_input_tokens to message.Usage.CacheReadTokens and subtract it from InputTokens. message.Usage.Add() sums InputTokens and CacheReadTokens as peers, so the uncached residual goes in InputTokens — matches the anthropic provider's convention and keeps cumulative usage tracking arithmetically correct. Fold reasoning_output_tokens into OutputTokens for accurate cost tracking. The top-level peer positioning (vs nested in output_tokens_details) implies a separately counted billable quantity, not a subset of output_tokens. Defensive clamp at zero in case a future codex build reports cached > input due to schema drift. Includes a verbatim regression guard against the live 2026-05-22 codex 0.133.0 output to catch schema changes early.	2026-05-22 13:35:57 +02:00
vikingowl	fe24907ce5	docs(readme): refresh post-v0.2.1 with badges and v0.2.x features - Add for-the-badge style shields (release, license, Go 1.26+, GHCR) - Drop the "until the first tag is cut" line that's been stale since v0.1.0 shipped on 2026-05-20 - Add a Vision / image input section covering Ctrl+V paste, literal [Image: /path] markers, the 10 MiB cap, the incognito carve-out, and the router's Vision capability gating - Add a Subprocess sandbox bypass subsection under Providers documenting GNOMA_AGY_BYPASS_PERMISSIONS and GNOMA_CODEX_BYPASS_SANDBOX as deliberate footguns - Add an Entropy false-positive reduction subsection under Security showing the [security].entropy_safelist opt-in (Phase F-1) and noting the per-pattern Debug telemetry that feeds F-2 gating	2026-05-22 13:21:31 +02:00
vikingowl	847ec159d7	chore(deps): promote cloud.google.com/go/auth and atotto/clipboard to direct go mod tidy (triggered by GoReleaser's before hook) correctly promoted both modules from indirect to direct: cloud.google.com/go/auth is imported by internal/provider/google for the ADC credential walk, and github.com/atotto/clipboard is imported by internal/tui for image-paste handling. Listing them as direct reflects actual usage and prevents tooling from suggesting their removal.	2026-05-22 13:06:36 +02:00
vikingowl	9ceddd39c1	chore(todo): track dockers_v2 migration under distribution follow-ups GoReleaser is phasing out the dockers + docker_manifests pair in favour of dockers_v2, which collapses our four-block setup into one. The migration also touches Dockerfile (per-platform binary layout in the build context), so it's worth scheduling as its own commit rather than a release-time rush.	2026-05-22 13:06:24 +02:00
vikingowl	3f74b6e362	fix(release): point GHCR image.source at GitHub mirror GHCR's package page auto-links to a GitHub repo via the org.opencontainers.image.source label. The previous value pointed at the Gitea canonical (somegit.dev/Owlibou/gnoma), which GHCR can't resolve — so the package page just showed a "Link this package to a repository" prompt and contributors, Readme, and discussions never auto-populated. Swap the two URL labels: source now points at the GitHub mirror, url keeps the Gitea canonical reference. Both arch build blocks updated. Takes effect on the next release (v0.2.0 images already shipped with the old labels and stay as-is). v0.2.1	2026-05-22 13:03:36 +02:00
vikingowl	49d80cf847	feat(security): format-aware entropy safelist (Phase F-1) Add a deterministic pre-extractor that skips known-safe token shapes before they reach the entropy scorer. Targets the false-positive regime that bites under lowered entropy_threshold or redact_high_entropy = true — UUIDs (~3.4 bits), SHA hex digests (~3.9 bits), ISO-8601 timestamps, and HTTP(S) URLs. Config knob lives under the existing security section to match entropy_threshold / redact_high_entropy convention: [security] entropy_safelist = ["uuid", "sha_hex", "iso8601", "url"] Empty / unset preserves pre-F-1 behaviour exactly — users opt in. Per-pattern Debug telemetry fires on every skip (pattern name + token length, never the token bytes). This is the data F-2's go/no-go gate depends on; the plan literally specifies it. NewFirewall validates names at the config boundary and emits a Warn for unknown entries so a typo like "uid" instead of "uuid" surfaces loudly instead of silently disabling FP reduction. Tests cover: UUID/SHA-1/SHA-256 skipped at lowered threshold, mixed payload (safe shape + real secret) preserves the secret, secret-adjacent-to-UUID regression guard, empty safelist preserves pre-F-1 behaviour, unknown name silently dropped at scanner level but warned at firewall level, end-to-end FirewallConfig wiring, and the skip-telemetry log line. F-2 remains gated on real-workload FP-rate observations. v0.2.0	2026-05-22 12:39:10 +02:00
vikingowl	ea1a5361e2	chore: restore agy JSON-output TODO; idiomatic t.TempDir() in google test The worktree commit `12a6b83` dropped the "Native agy JSON output" backlog item alongside removing the agy agent. Since we restored agy in this branch, the TODO is relevant again — agy v1.0.0 still emits plain text and the prompt-augmentation fallback should be replaced by --output-format stream-json once the CLI supports it. Switch TestTryLoadOAuthCredentials_Formats to t.TempDir() to drop the unchecked os.RemoveAll defer that golangci-lint's errcheck caught after the merge.	2026-05-22 12:17:10 +02:00
vikingowl	246997c4be	Merge branch 'feat/agy-sdk-integration' into dev Brings in the Google auth precedence work (agy > gemini > ADC credential walk, fileTokenProvider expiry handling, slog-backed error reporting), the Codex CLI integration as a new subprocess agent, and the restoration of the agy subprocess agent that was accidentally removed by the initial codex commit. Sandbox-bypass flags on both agy and codex are now opt-out via env vars (GNOMA_AGY_BYPASS_PERMISSIONS, GNOMA_CODEX_BYPASS_SANDBOX). Includes review-driven fixes: - ADC fallback now uses real DetectOptions (cloud-platform scope) - fileTokenProvider returns an error on expired tokens instead of shipping a known-dead bearer - TestNew_Precedence asserts which credential was actually picked - codex parser tolerates non-JSON banner / debug lines on stdout - codex usage takes max(input_tokens, prompt_tokens) so accounting can't silently undercount No conflicts expected with the dev image-content feature: the worktree branch only touches the google and subprocess provider families.	2026-05-22 12:15:32 +02:00
vikingowl	0975bf7118	docs(readme): list codex and vibe alongside claude/gemini/agy The subprocess CLI table only mentioned three agents; the full set now is claude, gemini, agy, codex, and vibe (Mistral). Bring the documentation in line with knownAgents.	2026-05-22 12:15:01 +02:00
vikingowl	afc31b0af4	fix(subprocess): restore agy alongside codex; env-gate sandbox bypass The original commit on this branch replaced the agy subprocess agent with codex (overwriting the slot in knownAgents, deleting agy_test.go and the agyParser). That was unintentional — agy (antigravity) is a distinct CLI from codex (OpenAI's). Antigravity will replace gemini when gemini retires on 2026-06-16, so it needs to keep its own slot. Restored: FormatAgyText constant, agyParser with newAgyParser and the line-delimited text parser, the agy CLIAgent entry in knownAgents with PromptResponseFormat:true, agy_test.go, and the agy case in newParser. Sourced from the parent commit so behavior matches what shipped before the codex change. Sandbox bypass: both agy (--dangerously-skip-permissions) and codex (--dangerously-bypass-approvals-and-sandbox) need a flag to run non-interactively (their stdin is closed; without it they block on approval prompts nobody can answer). Both default to ON for out-of-box behavior; operators with pre-approved trust config can opt out via GNOMA_AGY_BYPASS_PERMISSIONS=0 or GNOMA_CODEX_BYPASS_SANDBOX=0. Tests cover the on / opt-out / unknown value branches. TestKnownAgents_ValidFormats updated to accept the restored FormatAgyText.	2026-05-22 12:14:54 +02:00
vikingowl	1717f9f567	fix(subprocess/codex): tolerate non-JSON stdout, max-of-token-paths Codex emits banner / debug / "starting turn" lines to stdout interleaved with the JSON event stream. The parser previously returned an error on any line that wasn't a JSON object, which subprocessStream.Next treats as terminal — one stray banner aborted the whole turn. Skip lines that don't start with `{` after whitespace trim, and downgrade unparseable JSON-looking lines to a slog.Debug so they don't kill the stream either. Token accounting: usage payloads from newer codex builds occasionally carry both input_tokens and prompt_tokens (and likewise output / completion) with slightly different values. Always use the larger of the two so we can't silently undercount. Tests cover non-JSON banner skipping, malformed-JSON non-fatal-skip, and the max() behavior with both token fields populated.	2026-05-22 12:08:32 +02:00
vikingowl	f83ace7ad6	fix(google): real ADC scopes, expired-token rejection, error reporting credentials.DetectDefault(nil) always returns "options must be provided", which made the ADC branch unreachable. Pass an explicit DetectOptions with the cloud-platform scope so users with GOOGLE_APPLICATION_CREDENTIALS or `gcloud auth application-default login` actually flow through ADC instead of falling out as "no credentials found". fileTokenProvider.Token used to return expired tokens unchanged. We don't perform an OAuth refresh exchange (the upstream CLI does that out-of-band into the file we read), so when the file isn't fresh the only safe move is to fail loudly with an actionable message rather than ship a known-dead bearer that genai forwards to Vertex AI and gets back a confusing 401. tryLoadOAuthCredentials previously swallowed all errors equally, so the precedence walker silently skipped past misconfigured files (chmod 0600 on the wrong user, half-written JSON, etc.). Now os.IsNotExist is silent (normal walking), everything else gets a slog.Warn with the path so an unreadable file is visible. selectOAuthCredentials extracts the precedence chain into a testable helper that also returns a CredentialSource tag identifying which path was chosen. The previous precedence test only asserted err == nil; the new test verifies that the agy file wins when both are present and that the fallback to gemini actually loads the gemini token.	2026-05-22 12:08:22 +02:00
vikingowl	7491a36bb7	docs(todo): track unified sensitive-content handling Pasted images, pasted text, and tool-read files all carry the same risk class (screenshots with API keys, terminal pastes with creds, .env reads). Today these are handled inconsistently — incognito gates persistence but not provider egress, the outgoing-scan firewall is text-only. Note the cross-cut with Phase F entropy work and the firewall path so this isn't lost.	2026-05-22 11:58:23 +02:00
vikingowl	bd41d76e32	refactor(tui): store pasted images in user cache, not project workdir Ctrl+V image paste used to write the file to .gnoma/pasted_image_.png under the project root, which polluted the workdir and risked committing screenshots that may contain sensitive content. Now writes to os.UserCacheDir() / gnoma / pasted-images/ (XDG cache on Linux, ~/Library/Caches on macOS, %LocalAppData% on Windows). The directory is created at 0700 and files at 0600 since pasted content can be sensitive. Each paste prunes entries older than 2 hours best-effort, so the cache doesn't accumulate across sessions. The 2h window safely covers any single turn including provider retries and slow subprocess CLIs that need the file to still exist on disk when they ingest the path. .gitignore: cover the legacy `.gnoma/pasted_image_` location for old checkouts; add log.txt and codex_out.jsonl which were tracked as runtime artifacts during the recent work. Tests cover cache-path placement, restrictive perms on both the directory and the file, the no-pollution-of-cwd invariant, and the prune behavior (stale removed, fresh kept, missing dir no-op).	2026-05-22 11:56:04 +02:00
vikingowl	c5cc98ed8a	feat(provider/openai): translate user image content to image_url parts When the user message has at least one ImageContent block, build a ChatCompletionContentPartUnionParam array with text + image_url parts instead of the string content path. Image bytes are inlined as a base64 data URL (data:<media-type>;base64,...). Adjacent text blocks are merged into a single TextContentPart. Pure-text user messages stay on the existing string fast path. This covers OpenAI direct + every openaicompat backend (Ollama, llama.cpp, llamafile) since they all share the same provider. Tests: pure text uses OfString; image present emits 2 content parts (text + image_url with the expected base64 payload); nil-Image blocks are dropped and adjacent text merges correctly.	2026-05-22 11:50:55 +02:00
vikingowl	bc137182d4	feat(engine): parse [Image: /path] markers, gate on Vision capability buildUserMessage replaces the unconditional NewUserText wrap inside SubmitWithOptions. When the active model advertises Vision and the input contains [Image: /path] markers, the markers are inlined as ImageContent blocks carrying the file bytes; otherwise the input is passed through as a single text block (legacy behavior preserved for subprocess CLIs that auto-ingest paths, e.g. gemini-cli). image_input.go: - imageMarkerRe extracts each [Image: ...] occurrence. - Per marker: validates absolute path, file (not dir), size cap of 10 MiB, image/* media type via http.DetectContentType. - On any validation failure, the marker is left as literal text and a warning is recorded — the turn still proceeds. Routing: latestUserHasImages drives task.RequiresVision in both the primary stream attempt and the retryOnTransient path, so failover arms also respect the vision requirement. Tests cover: no markers (single text block), single image (bytes captured into Image.Data, MediaType set), missing file (literal fallback + warning), relative path rejection, oversized rejection, non-image file rejection, multiple images interleaved with text.	2026-05-22 11:50:45 +02:00
vikingowl	a2b7f8eb3f	feat(router): vision capability gating and Ollama vision detection Task gains a RequiresVision bool; filterFeasible enforces it on both the primary feasibility pass and the last-resort fallback (no degradation to a non-vision arm — the model literally cannot consume image bytes). Ollama discovery now probes /api/show for vision capability: - details.families containing "clip" / "mllama" / "*vl" - capabilities array containing "vision" (newer Ollama) - name-prefix fallback for releases that predate either (llava, qwen2.5-vl, llama3.2-vision, moondream, pixtral, etc.) OllamaProbeResult replaces the map[string]bool tool cache so the single /api/show call can populate tools + vision + ctx-size in one probe. DiscoverOllama / DiscoverLocalModels signatures updated; nil-cache callers in cmd/gnoma keep working unchanged. RegisterDiscoveredModels propagates SupportsVision into the arm's Capabilities.Vision. Tests cover RequiresVision filtering in both the happy path (vision-only arm chosen when image present) and the fallback path (non-vision arm rejected even as last resort).	2026-05-22 11:50:33 +02:00
vikingowl	d37cc2dad3	feat(message): add ContentImage type for inline image bytes Extends the Content discriminated union with a fifth variant for inline image payloads. Image carries the raw bytes (captured at user-input time so the message snapshot is self-contained and survives source-file deletion), the IANA media type for the provider's image part, and the original path for logging. HasImages() lets providers decide whether to fall back to a text-only representation; providers that don't know about ContentImage will simply skip those blocks via TextContent().	2026-05-22 11:50:20 +02:00
vikingowl	e38cce5f1f	fix(tui): security hardening, race-safety, and event handling fixes Bundles the pending TUI work into a coherent batch. Bug fixes from external review: * expandPlaceholders: single-pass alternation regex over the original input prevents `#p\d+` / `#img\d+` tokens inside pasted content from being re-expanded after the bracket form is inlined. * /incognito: gate savePromptHistory and the Ctrl+V image-write branch on `!m.incognito` so the no-persistence contract holds. * history.txt: write at mode 0600 (chmod existing 0644 files), create parent dir at 0700, truncate to 500 entries on every save, slog.Warn on errors instead of swallowing. * triggerPickerAction: guard m.config.Engine before SetModel, matching the /model handler. * Picker key handler: navigation/enter/q consume, escape/ctrl+c close the picker AND fall through to global handlers (so streaming cancel and double-tap quit work with an overlay open), default swallows stray input. * Paste line count: report total non-empty lines instead of newline count, ignoring trailing newlines (no more "+0 lines" for "abc"). * Ctrl+O restored to expand-output; Ctrl+Y is the new copy-response bind. /keys help text updated; picker help entries reordered. * Tighter perms on .gnoma/pasted_image_*.png (0600). Race-safety refactor: ApplyTheme used to mutate ~25 package-level lipgloss styles in place. Replaced with an immutable themeStyles snapshot and atomic.Pointer[themeStyles] swap. Readers go through a theme() helper (one atomic load) instead of touching package vars directly. No locks, no nested-RLock risk if rendering ever moves off-thread. Includes pre-existing in-flight work: TUISection in config with persistent theme/vim settings; /copy /theme /vim slash commands; provider-name completion; session.SetProvider for the provider picker. Tests: placeholder_test.go (6 regression + happy-path cases including the pasted-content collision), history_test.go (5 cases covering perms on new and existing files, on-disk truncation, blank-input, newline flattening), provider_test.go (provider switching + picker transitions + SLM gating).	2026-05-22 11:50:12 +02:00
vikingowl	12a6b83cc9	feat: implement Google auth precedence and Codex integration	2026-05-22 00:21:32 +02:00

1 2 3 4 5

246 Commits