# gnoma [![Release](https://img.shields.io/github/v/release/VikingOwl91/gnoma?style=for-the-badge&logo=go&logoColor=white&color=00ADD8)](https://github.com/VikingOwl91/gnoma/releases) [![License](https://img.shields.io/badge/license-Apache%202.0-blue?style=for-the-badge)](LICENSE) [![Go](https://img.shields.io/badge/go-1.26%2B-00ADD8?style=for-the-badge&logo=go&logoColor=white)](go.mod) [![Container](https://img.shields.io/badge/ghcr.io-vikingowl91%2Fgnoma-2496ED?style=for-the-badge&logo=docker&logoColor=white)](https://github.com/VikingOwl91/gnoma/pkgs/container/gnoma) **A provider-agnostic agentic coding assistant in Go.** gnoma routes each prompt to the best available model — cloud or local — through a multi-armed bandit router, executes tools on your behalf, and stays extensible through hooks, skills, MCP servers, and plugins. ![gnoma TUI showing a routed turn](docs/img/gnoma-tui.png) *Every turn shows which arm the router picked and why — here a local `qwen3:14b` was selected for a `generation` task.* ## What makes gnoma different - **Multi-armed bandit router.** Per-prompt arm selection based on capability gates, declared `Strengths`, latency, and cost. Visible in the TUI on every turn — no black box. - **`[router].prefer = local | cloud | auto`.** Pin routing toward local models, cloud, or let the bandit decide. Offline-first workflows still reach for Claude when the local model would obviously flail. - **Tier-0 SLM routing.** A tiny local model classifies each prompt and handles trivial tasks itself, keeping the heavy provider for real work. - **Content boundary + secret scanner.** Every outgoing LLM message and incoming tool result is scanned for secrets (regex + Shannon entropy on long tokens), redacted or blocked at the content level. Paths are canonicalised (TOCTOU-safe), Unicode is sanitized (homoglyphs, BiDi tricks), and a `SafeProvider` boundary keeps incognito-mode data out of long-lived stores. *(Per-host network egress allowlist is on the roadmap, not in place today.)* - **No phone-home.** gnoma itself sends nothing off-machine — zero analytics endpoint, zero metrics service, no remote logging. Prompts of course go to whatever provider you route them to: cloud arms ship data to that provider by design; pair Ollama/llama.cpp with `--incognito` if you want everything on-device. - **Provider-agnostic from day one.** Anthropic, OpenAI, Google, Mistral, Ollama, llama.cpp, plus subprocess CLIs (`claude`, `codex`, `agy`, `vibe`). Mix cloud and local in the same session. - **Vision end-to-end.** `[Image: /path]` markers in prompts, `Ctrl+V` paste in the TUI, capability-gated per arm. - **Single static binary.** `CGO_ENABLED=0`, multi-arch container on ghcr.io. No daemon, no runtime deps. ## Status Pre-1.0 (current: **v0.3.0**). Single maintainer, breaking changes possible. The provider, router, and engine surfaces are settling; config schema and TUI bindings may still shift between minor versions. Apache 2.0. ## Table of contents - [Install](#install) - [Quickstart](#quickstart) - [Vision / image input](#vision--image-input) - [Providers](#providers) - [Config](#config) - [Routing defaults](#routing-defaults) - [SLM routing](#slm-small-language-model-routing) - [Session persistence](#session-persistence) - [Extensibility](#extensibility) - [Subcommands](#subcommands) - [Security](#security) - [Development](#development) - [About](#about) - [License](#license) --- ## Install ### Pre-built binary (no Go toolchain required) Releases are built by [GoReleaser](.goreleaser.yml) for `linux`, `darwin`, and `windows` × `amd64`/`arm64` as static (`CGO_ENABLED=0`) archives. Grab the one matching your OS/arch from : ```sh # Linux/macOS one-liner (substitute the asset URL): curl -fsSL | tar -xz -C /tmp sudo mv /tmp/gnoma /usr/local/bin/ gnoma --version ``` Windows: download the `_windows_*.zip`, extract `gnoma.exe`, and put it on `%PATH%`. ### Docker Multi-arch images (`linux/amd64`, `linux/arm64`) are published to GitHub Container Registry on each tagged release: ```sh docker pull ghcr.io/vikingowl91/gnoma:latest docker run --rm -it -v "$PWD:/workspace" ghcr.io/vikingowl91/gnoma:latest --version ``` Mount your project as `/workspace` (the image's working directory) and pass any provider keys via `-e VAR_NAME` — see the [Providers](#providers) table for env-var names. ### Go users ```sh go install somegit.dev/Owlibou/gnoma/cmd/gnoma@latest # latest tagged go install somegit.dev/Owlibou/gnoma/cmd/gnoma@main # bleeding edge ``` ### Build from source ```sh git clone https://somegit.dev/Owlibou/gnoma && cd gnoma make build # → ./bin/gnoma make install # → $GOPATH/bin/gnoma ``` Requires Go 1.26+. --- ## Quickstart Set at least one provider key (env var names are listed in the [Providers](#providers) table below) — or run a local model and skip the keys entirely. ```sh gnoma # interactive TUI echo "list files" | gnoma # pipe / one-shot mode gnoma --provider ollama # use a local model (no API key needed) gnoma --version ``` Inside the TUI, `Ctrl+X` toggles **incognito** (no session saved, no router learning); `/help` lists slash commands; `Esc` cancels an in-flight turn. --- ## Vision / image input `Ctrl+V` in the TUI pastes a screenshot from the system clipboard: gnoma writes the bytes to your user cache and inserts a `[Pasted image #imgN]` placeholder, which expands to `[Image: /path]` when the turn is sent. You can also type a literal `[Image: /path]` marker anywhere in a prompt to reference an existing file: ``` explain this error [Image: /tmp/screen.png] — what's the root cause? ``` Image markers are parsed by the engine, files larger than 10 MiB are skipped (the marker stays as plain text), and the router only routes vision-tagged turns to arms that declare the `Vision` capability (Anthropic, OpenAI, Google, and Ollama models that advertise multimodal support). Image paste is disabled under `--incognito` to honour the no-persistence contract. --- ## Providers | Provider | Env var | Default model | Also available | |---|---|---|---| | Anthropic | `ANTHROPIC_API_KEY` | `claude-sonnet-4-6` | `claude-opus-4-7`, `claude-haiku-4-5-20251001` | | OpenAI | `OPENAI_API_KEY` | `gpt-5.5` | `gpt-5.5-pro`, `gpt-5.2`, `gpt-5.2-chat-latest` | | Google (Gemini) | `GEMINI_API_KEY` (alt: `GOOGLE_API_KEY`) | `gemini-3.5-flash` | `gemini-3.1-pro-preview`, `gemini-3.1-flash-lite` | | Mistral | `MISTRAL_API_KEY` | `mistral-large-latest` (Mistral Large 3) | `mistral-medium-3.5`, `magistral-medium-2509` | | Ollama (local) | — | `qwen3:8b` (override with `--model`) | any model on your Ollama instance | | llama.cpp (local) | — | reported by `/v1/models` | n/a | | Subprocess (`claude`, `gemini`, `agy`, `codex`, `vibe` CLIs) | provider-specific | binary name | configurable via `[cli_agents]` | Override per-invocation: ```sh gnoma --provider anthropic --model claude-opus-4-7 gnoma --provider openai --model gpt-5.5-pro # GPT-5.5 is the default; pro is the higher-accuracy tier gnoma --provider google --model gemini-3.1-pro-preview gnoma --provider ollama --model qwen2.5-coder:3b gnoma --provider llamacpp # model picked from server ``` `gnoma providers` prints every discovered provider, model, and CLI agent. **Subprocess sandbox bypass.** The `agy` and `codex` CLIs each run with their respective sandboxes enabled by default. Two env vars exist for the rare case where a sandbox blocks legitimate work (e.g., reading files outside the project root): | Env var | Effect | |---|---| | `GNOMA_AGY_BYPASS_PERMISSIONS=1` | Skip agy's permission prompts | | `GNOMA_CODEX_BYPASS_SANDBOX=1` | Disable codex's filesystem sandbox | These are footguns — set them deliberately, per-invocation. They do not disable gnoma's own permission system, hooks, or firewall. ### Local models Start your local server, then point gnoma at it: ```sh # Ollama (default http://localhost:11434/v1) ollama pull qwen2.5-coder:3b gnoma --provider ollama --model qwen2.5-coder:3b # llama.cpp (default http://localhost:8080/v1) llama-server --model /path/to/model.gguf --port 8080 --ctx-size 8192 gnoma --provider llamacpp ``` Override the endpoint in `.gnoma/config.toml`: ```toml [provider.endpoints] ollama = "http://myhost:11434/v1" llamacpp = "http://localhost:9090/v1" ``` --- ## Config Configuration merges (lowest → highest priority): 1. Built-in defaults 2. `~/.config/gnoma/config.toml` — global base 3. `~/.config/gnoma/profiles/.toml` — active profile (when profile mode is enabled) 4. `/.gnoma/config.toml` — project override 5. Environment variables (`GNOMA_PROVIDER`, `GNOMA_MODEL`, `*_API_KEY`) Example global config: ```toml [provider] default = "anthropic" model = "claude-sonnet-4-6" [provider.api_keys] anthropic = "${ANTHROPIC_API_KEY}" [provider.endpoints] ollama = "http://localhost:11434/v1" llamacpp = "http://localhost:8080/v1" [permission] mode = "auto" # default | accept_edits | bypass | deny | plan | auto [session] max_keep = 20 # sessions retained per project ``` ### Profiles Drop multiple configs under `~/.config/gnoma/profiles/` and switch with `--profile ` or `/profile `. Each profile keeps its own router quality data and session history. Full details: [docs/profiles.md](docs/profiles.md). --- ## Routing defaults Discovered arms ship with opinionated defaults — `Strengths` (per-task preference) and `MaxComplexity` (ceiling above which the arm won't be picked) — so a freshly-pulled fleet routes sensibly without any `[[arms]]` config. Defaults match against the model ID with longest-prefix-wins; size-keyed families (Qwen 3, Ministral 3, tiny3.5, etc.) scale `MaxComplexity` down for smaller variants automatically. Non-chat models (`embeddinggemma`, `whisper-base`, `kokoros`, `vibevoice`, `*-asr`, `*-tts`, `*-audio`, `*-reranker`, `*-embedding`) are skipped during discovery so they never register as broken chat arms. | Local family | Strengths | MaxComplexity | |---|---|---| | `qwen3-coder` / `devstral` | Generation, Refactor, Debug | 0.85 | | `qwen2.5-coder` | Generation, Refactor, UnitTest | 0.70 | | `phi-4` | Planning, Debug, Review | 0.65 | | `gemma4` (base ~9B) | Explain, Review, Generation | 0.70 | | `gemma4-e` / `gemma-4-e` (edge 2B–4B) | Explain, Boilerplate | 0.45 | | `mistral-small-3` | Orchestration, Review | 0.65 | | `qwen3` | Generation, Refactor, Debug | 0.50–0.75 (size-keyed) | | `qwen3.5` | Boilerplate, Explain, Orchestration | 0.40–0.65 | | `ministral-3` | Orchestration, Planning | 0.35–0.70 | | `tiny3.5` | Boilerplate, Explain | 0.20–0.30 | | `phi-4-mini` / `llama3.2` / `granite` | Boilerplate, Explain | 0.30–0.35 | | `functiongemma` | (Disabled — reserved for tool-router role) | 0.40 | | Cloud model | Strengths | CostWeight | |---|---|---| | `claude-opus-4-7` | Planning, SecurityReview, Debug, Refactor | 0.3 | | `claude-sonnet-4-6` | Generation, Refactor, Review | 0.7 | | `gpt-5.5` | Planning, SecurityReview, Generation | 0.3 | | `gpt-5.3-codex` | Generation, Refactor, Debug, UnitTest | 0.6 | | `gpt-5.2` | Orchestration, Review | 0.8 | | `gemini-3.1-pro` | Planning, Review, Orchestration | 0.5 | | `gemini-3.5-flash` | Boilerplate, Explain, Orchestration | 1.2 | `CostWeight` scales how much $/Mtok matters in scoring: values below 1.0 keep expensive frontier arms competitive on high-stakes tasks (Planning, SecurityReview); values above 1.0 penalize cost more so cheap fast arms only win when cost is genuinely decisive. ### Overriding the defaults Drop an `[[arms]]` block in `config.toml` to override per-arm `Strengths` or `CostWeight`. User values win — defaults only fill zero fields: ```toml [[arms]] id = "anthropic/claude-opus-4-7" strengths = ["security_review", "planning", "debug"] cost_weight = 0.2 # weight cost even less than the default 0.3 [[arms]] id = "ollama/qwen3-coder:30b" strengths = ["generation", "refactor"] ``` Full rationale and benchmark sources behind these defaults: [`docs/superpowers/plans/2026-05-23-routing-defaults-refresh.md`](docs/superpowers/plans/2026-05-23-routing-defaults-refresh.md). ### Preferring local vs cloud `[router].prefer` biases routing toward one camp without hard-filtering the other: ```toml [router] prefer = "auto" # auto (default) | local | cloud ``` | Value | Effect | |---|---| | `"auto"` | No bias. Tier order (SLM → CLI-agent → local → cloud) decides, with Strengths and quality scores breaking ties. Default. | | `"local"` | Cloud arms are demoted by 2 tiers. Local + CLI-agent arms always win unless no local option is feasible. | | `"cloud"` | Local arms are demoted by 2 tiers. Cloud arms win, **except** for tier-0 SLMs — a small specialist arm whose `MaxComplexity` ceiling fits the task still wins, by design (the SLM is for small stuff). | Three things still take priority over `prefer`: - `--provider X` pins the forced arm. - Incognito (`Ctrl+X` or `--incognito`) hard-filters cloud arms — `prefer = "cloud"` under incognito still picks a local arm. - A `Strengths`-tagged arm always wins its tagged task type, regardless of `prefer`. Tag Opus with `[security_review]` under `prefer = "local"` and Opus still wins SecurityReview tasks. CLI-agent subprocess arms (`claude`, `gemini`, `vibe`) count as **local** for this knob — they proxy to cloud but run as local processes. Use `--provider ` if you need to pin a specific subprocess. --- ## SLM (small-language-model) routing gnoma can run a tiny local model alongside the main provider to: - **Classify** each prompt (task type + complexity + tool requirement) so the router picks the right arm. - **Execute** trivial tasks itself (knowledge questions, single file reads, anything with complexity ≤ 0.3), keeping the heavy provider for real work. ```toml [slm] enabled = true backend = "auto" # ollama | llamacpp | llamafile | openaicompat | auto | disabled model = "qwen3:0.6b" register_as_arm = true # default; set to false to make the SLM classifier-only # (e.g. for FunctionGemma, code-completion-tuned models) classify_timeout = "15s" # default; bump higher for slow cold-loads ``` Setup, presets, and verification: [docs/slm-backends.md](docs/slm-backends.md). The `auto` backend probes Ollama → llama.cpp → llamafile on startup and picks the first reachable option. Inspect with `gnoma slm status` and `gnoma router stats`. --- ## Session persistence Sessions are auto-saved per project under `.gnoma/sessions//` after each completed turn. On a crash you lose at most the current in-flight turn. ```sh gnoma --resume # interactive picker gnoma --resume # restore by ID gnoma -r # shorthand gnoma --incognito # no save, no router learning ``` Inside the TUI: `/resume`, `/resume `, `Ctrl+X` (incognito toggle). Router-quality data (EMA scores) is stored at `~/.config/gnoma/quality.json` (or `quality-.json` in profile mode). --- ## Extensibility ### MCP servers Connect any [MCP](https://modelcontextprotocol.io)-compatible server: ```toml [[mcp_servers]] name = "git" command = "mcp-server-git" args = ["--repo", "."] timeout = "30s" # Optionally replace a built-in tool with an MCP one [mcp_servers.replace_default] exec = "bash" ``` MCP tools appear as `mcp__{server}__{tool}` unless mapped via `replace_default`. ### Skills Drop markdown files into `.gnoma/skills/` or `~/.config/gnoma/skills/`. Invoke with `/`. List with `/skills`. ### Hooks Shell commands run on tool events (`pre_tool_use`, `post_tool_use`, etc.): ```toml [[hooks]] name = "block-rm-rf" event = "pre_tool_use" type = "command" exec = "bash-safety-check.sh" tool_pattern = "bash*" ``` Ordering rules: [ADR-004](docs/essentials/decisions/004-posttooluse-hook-ordering.md). ### Plugins Plugins bundle skills, hooks, and MCP server configs. Drop a plugin directory into `~/.config/gnoma/plugins/` (global) or `/.gnoma/plugins/` (project-local); gnoma auto-discovers them on startup. Each plugin's `plugin.json` is pinned by SHA-256 on first load (Trust-On-First-Use). A manifest that changes between runs is refused with a clear error and a re-enrolment hint. Full model: [docs/plugins-trust.md](docs/plugins-trust.md) and [ADR-003](docs/essentials/decisions/003-plugin-trust.md). ### Elfs (sub-agents) The `spawn_elfs` tool decomposes work into parallel sub-tasks. See [`internal/skill/skills/batch.md`](internal/skill/skills/batch.md) for the built-in batching skill. --- ## Subcommands | Command | What it does | |---|---| | `gnoma providers` | List every discovered provider, model, and CLI agent | | `gnoma profile list` / `show ` | Profile diagnostics | | `gnoma router stats` | Quality EMA + classifier source breakdown | | `gnoma slm setup` / `slm status` | Manage the llamafile-backed SLM | `gnoma --help` for the full flag set. --- ## Security gnoma runs tools and shell commands on your behalf. The [`internal/security`](internal/security) package canonicalises every path (TOCTOU-safe), scans every outgoing LLM message and incoming tool result for secrets (regex + Shannon entropy) before it reaches the model, and sanitizes Unicode (homoglyphs, BiDi tricks). The `SafeProvider` boundary keeps incognito-mode data out of long-lived stores. > **Scope note.** The current "firewall" is a content boundary — it > redacts/blocks secrets in inputs and outputs. It is **not** a > network-egress firewall: outgoing HTTP from tools and providers goes > through stock `http.Client`, with no per-host allowlist or > dial-layer enforcement. Per-host egress rules and a per-session > audit log of blocked/redacted events are tracked in > [TODO.md](TODO.md). > > **Data flow.** gnoma itself emits no telemetry to external services > — no analytics, no metrics endpoint, no remote logging. When you > route to a cloud provider (Anthropic, OpenAI, Google, Mistral), > prompts and tool data are sent to that provider as required to > fulfill the request — by design. For fully on-device operation, > use Ollama or llama.cpp and `--incognito`. ### Entropy false-positive reduction The secret scanner also computes Shannon entropy on long unstructured tokens to catch unknown-format secrets. Under a lowered threshold or `redact_high_entropy = true`, this can fire on shapes that are never secrets (UUIDs, SHA digests, ISO-8601 timestamps, URLs). Opt into the format-aware safelist to skip them: ```toml [security] entropy_threshold = 3.5 redact_high_entropy = true entropy_safelist = ["uuid", "sha_hex", "iso8601", "url"] ``` Default is an empty list — pre-safelist behaviour. Skips are logged (`Debug`-level, per pattern, token length only — never the bytes) so the real false-positive rate is measurable on real workloads. ### Startup safety check gnoma classifies the current working directory before launch and refuses, warns, or allows based on tier: | Tier | What | Behavior | |---|---|---| | **Refuse** | `/`, `/etc`, `/sys`, `/proc`, `/usr`, `/var`, `/bin`, `/sbin`, `/boot`, `/root`, `/dev` (and macOS equivalents `/System`, `/Library`, `/private`, `/Applications`) | Refuses to start. Exit code 2. | | **Warn** | `$HOME`, `~/Desktop`, `~/Downloads`, `~/Documents`, `~/.config`, `~/.local`, `~/.cache`, `/tmp` | Prints a warning banner and waits for `y` keypress to continue. Anything else (including piped EOF) aborts with exit 1. | | **OK** | Anywhere with a project marker (`.gnoma/`, `go.mod`, `package.json`, `pyproject.toml`, `Cargo.toml`, `Makefile`, `Dockerfile`, `build.gradle`, `pom.xml`) or inside a git repo | No prompt. | A project marker anywhere — including inside `$HOME` — promotes the directory to OK. The banner is shown for every tier and summarizes cwd, git branch, project type, provider, model, modes, and a top-level sensitive-file inventory (`.env`, SSH keys, `*.pem`, `.ssh/`, `.aws/`, etc.). ```toml [safety] refuse_in_system_dirs = true # default warn_in_home = true # default require_project_marker = false # default — being inside a git repo is enough ``` Bypass all safety checks with `--dangerously-allow-anywhere`. Required for non-interactive invocations (piped stdin, CI) in warn-tier dirs, since there's no human present to consent. Containers (`/.dockerenv` or `/run/.containerenv` present) automatically downgrade refuse-tier paths to warn-tier — devcontainers commonly run from `/` or `/workspace`. Full design: [`docs/superpowers/plans/2026-05-23-startup-safety-banner.md`](docs/superpowers/plans/2026-05-23-startup-safety-banner.md). Architecture references: - [docs/essentials/INDEX.md](docs/essentials/INDEX.md) — full architecture map - [docs/essentials/decisions/](docs/essentials/decisions/) — ADRs 001–004 --- ## Development ```sh make build # ./bin/gnoma make test # unit tests make test-integration # //go:build integration — requires real API keys make cover # coverage.html make lint # golangci-lint make check # fmt + vet + lint + test ``` Architecture, conventions, and TDD workflow: [CONTRIBUTING.md](CONTRIBUTING.md). --- ## About Named after the northern pygmy-owl (*Glaucidium gnoma*); agents are called **elfs** (elf owl). - **Upstream:** - **GitHub mirror:** (read-only; PRs go to upstream Gitea) ## License Apache License 2.0. See [LICENSE](LICENSE) and [NOTICE](NOTICE).