23 Commits

Author SHA1 Message Date
vikingowl fa65a68728 docs(plans): config-migration and sensitive-content-policy
Release / release (push) Has been cancelled
Promotes two TODO entries into phased plan docs and links them
from the TODO bullets.

config-migration plan covers the silent layered-config corruption
chain (encoder zero-spam -> reader overwrite -> wrong effective
values) and its remediation across five phases: encoder fix
(omitempty + pointer-numeric hybrid), project registry, gnoma
doctor, gnoma upgrade-config, and auto-migration on startup with
banner notice.

sensitive-content-policy plan unifies three input paths (pasted
text, pasted images, tool-read files) behind one decision API
with consistent UI surface and audit-log integration. Phases A-E
sequence the work from highest-leverage (text paste) to most
complex (image OCR with local vision arm).

Neither plan starts implementation in this commit — they exist to
make the design decisions explicit so the eventual code can be
reviewed against a written intent rather than a TODO bullet.
2026-05-24 22:51:33 +02:00
vikingowl 8b9bdc2978 feat(security): per-session firewall audit log
New AuditLogger writes one JSON line per firewall action to
<projectRoot>/.gnoma/sessions/<sessionID>/audit.jsonl so a user can
grep 'what did the firewall do this session?' after the fact.

Records 'block', 'redact', 'warn', and 'unicode_sanitize' events with
the matcher name, source (tool_result / message_text / etc.), and
token length. Discipline: never the bytes themselves — only the
matcher name and the length, matching the README's scope-note
promise about audit data.

Plumbing:
- Firewall gains an audit *AuditLogger field plus SetAudit setter.
  The firewall is constructed before the session ID exists, so the
  audit logger is wired post-hoc once main.go has the sessionID.
- Honours incognito: Record is a silent no-op when the firewall's
  IncognitoMode is active, preserving the no-persistence contract.
- Tolerant of fs errors: mkdir / open / encode failures log a Warn
  but never propagate; the scan pipeline must not depend on audit
  succeeding.
- Nil receiver is a valid no-op so callers don't need nil-guards
  around every Record.

Tracks 'Security boundary — per-session audit log' from the
v0.3.0 r/SideProject launch thread (u/Secret_Theme3192,
2026-05-24). Per-host egress allowlist remains separately tracked
pending the commenter's reply on host-level vs per-tool semantics.
2026-05-24 22:47:28 +02:00
vikingowl eea26a262e feat(router): surface bandit knobs as [router.bandit] config
Four hardcoded constants in the selector and feedback tracker are now
user-tunable via [router.bandit]:

- quality_alpha    (EMA smoothing, default 0.3)
- min_observations (samples before observed overrides heuristic, default 3)
- observed_weight  (observed/heuristic blend ratio, default 0.7)
- strength_bonus   (quality bonus for Strengths-tagged arms, default 0.15)

Each field treats 0 as 'use default', so an empty TOML block is
byte-identical to pre-config behaviour. BanditParams is plumbed via
router.Config{Bandit: ...} and resolveBanditParams() centralises the
fallback so every call site shares the same defaults.

QualityTracker, scoreArm, bestScored, and selectBest signatures now
take the configured values directly rather than reaching for package-
level constants. Tests updated to pass BanditParams{} (defaults) or
explicit overrides where they validate the new tuning paths.

Tracks item #3 from the 'Bandit selector — design decisions deferred'
TODO entry — ships independently of the EMA vs SLM strategic decision.
2026-05-24 22:42:34 +02:00
vikingowl 352cab4a94 docs(todo): extend config-migration plan with project registry
Release / release (push) Has been cancelled
Adds item #5 to the config write/merge corruption entry:
~/.config/gnoma/projects.json tracking which directories gnoma has
been launched in. Enables doctor --all-projects, cross-project
session listing, and one-shot upgrade-config across all known
projects.

Documents the design constraints: must use the same omitempty /
atomic-write discipline as the encoder fix to avoid recreating the
class of bug it exists to help solve. Privacy footprint flagged
(local-only directory log; opt-out toggle). Stale-entry handling
gated through doctor, not auto-prune.
2026-05-24 22:29:56 +02:00
vikingowl 58f4001917 docs(todo): track config write/merge corruption + doctor/upgrade design
setConfig() serializes the entire Config struct on every key change,
which writes zero-valued fields into the file. On the next load those
explicit zeros override higher-priority layers via toml.Decode's
present-beats-absent semantics. Concrete symptom today: a global
prefer = 'cloud' was silently shadowed by a project prefer = ''.

Captures the multi-part fix surface so it doesn't get half-done:
- Stop generating zero-spam (omitempty hybrid or pelletier swap).
- gnoma doctor: read-only diagnostic (zero-spam, invalid enums,
  removed keys, effective-merged values).
- gnoma upgrade-config: active migration with .bak backup + diff.
- Auto-migrate project-level on startup with TUI banner notice;
  global stays explicit.
2026-05-24 22:24:59 +02:00
vikingowl 6c5e969217 feat(tui): add /router command for runtime routing-preference switch
Mirrors the pattern of /permission: bare command shows the current
value plus a help line; with an argument (auto/local/cloud) it calls
Router.SetPreferPolicy and emits a system message. Session-only — does
not write back to config.toml, matching /permission and Ctrl+X
incognito-toggle conventions.

Tab completion on the value via routerPreferModes alongside the
existing permissionModes pattern. Help text updated. Status-bar
indicator deferred (separate concern if it turns out to be wanted).
2026-05-24 22:13:27 +02:00
vikingowl 74bd570438 fix(tui): de-dupe /init in command picker; skill names shadow builtins
/init appeared twice in the completion picker — once from the static
builtinCommands list and once from the bundled init skill at
internal/skill/skills/init.md (registered via skills.All()).

Two changes:

- Remove /init from builtinCommands. The skill provides the canonical
  entry, and its description ('Generate or update AGENTS.md project
  documentation') is more accurate than the static one ('initialize
  project — create AGENTS.md') because the skill handles both create
  and update.
- Refactor completionSource() so a skill name silently shadows any
  builtin with the same name. Prevents this from recurring if a
  future builtin migrates to a skill, and lets users override a
  builtin's description by dropping a skill of the same name into
  .gnoma/skills/.
2026-05-24 22:08:46 +02:00
vikingowl d38d7daf25 fix(subprocess/agy): disable ToolUse until stream-json lands
agy is registered with FormatAgyText and the agyParser emits every
stdout line as a plain EventTextDelta. There is no path for a
structured ToolCall event to come back. With ToolUse=true the router
would dispatch tool-needing tasks (security_review, spawn_elfs, file
edit) to agy; the underlying Gemini model would describe calling the
tool in prose — invented UUIDs and 'I will pause now'-style stubs —
the engine would receive only text, and the turn would hang waiting
for a tool call that never arrives.

Surfaced when /init routed to agy for a security_review task and
elf spawning visibly hallucinated in the TUI. Capability flag
flipped to false; agy stays usable for tool-free prompts (explain,
summarize, simple chat). TODO entry for native stream-json updated
to flag that the capability flip is part of that same change.
2026-05-24 21:58:22 +02:00
vikingowl 06d4069076 ci: pin GoReleaser to the triggering tag, fix tag-collision regression
Release / release (push) Has been cancelled
When v0.3.1 was tagged on the same commit as v0.3.1-rc2, the release
workflow built and tried to publish rc2 artifacts instead of v0.3.1,
failing with 'already_exists' on every asset upload.

Root cause: goreleaser-action@v6 + 'version: latest' (locked to v2.x)
falls back to 'git describe --tags' for the current tag, which picked
v0.3.1-rc2 over v0.3.1 when both refs pointed at HEAD. Explicitly
setting GORELEASER_CURRENT_TAG = github.ref_name forces the workflow
to use the tag that triggered it, regardless of other refs at the same
commit.
2026-05-24 17:36:01 +02:00
vikingowl f641bd4971 docs(todo): track bandit selector design questions
Two related items surfaced from the r/coolgithubprojects v0.3.1
launch thread. Bundled because they share the selector code:

1. Whether to keep numeric EMA at all post-SLM dispatcher (open
   strategic question from the 2026-05-07 roadmap — not a
   must-implement).
2. Surfacing hardcoded selector knobs (qualityAlpha, blend ratio,
   strength bonus, quality floor) as [router.bandit] config keys —
   ships independently of #1.
2026-05-24 17:34:13 +02:00
vikingowl 798f2ab3c3 fix(release): prerelease auto-detect; changelog excludes scoped conventional commits
Release / release (push) Has been cancelled
Two polish issues surfaced by the v0.3.1-rc1 pipeline test:

- The release was tagged v0.3.1-rc1 but published without the
  prerelease flag, so it appeared alongside stable releases. Add
  'prerelease: auto' to release.github so GoReleaser marks any tag
  with a semver prerelease suffix (-rc, -beta, -alpha, -pre)
  appropriately.

- The changelog filters used '^docs:' patterns that only match bare
  conventional commits. Scoped variants like 'docs(readme):' and
  'chore(make):' slipped through into the published changelog.
  Switch to '^docs[:(]' style patterns to match both forms, and add
  '^style[:(]' so gofmt-drift commits are excluded too.
2026-05-24 17:05:49 +02:00
vikingowl 9814795b3c ci: migrate release pipeline from Woodpecker to GitHub Actions
Release / release (push) Has been cancelled
Drop the broken .woodpecker/release.yml (top-level when: triggered an
'error' status on every dev push instead of skipping non-tag events)
and replace with .github/workflows/release.yml driving the same
GoReleaser flow.

Rationale:
- Release artifacts already land on GitHub (releases + ghcr.io), so
  running the pipeline on GitHub eliminates a build hop.
- GH Actions auto-provides GITHUB_TOKEN with packages:write via the
  workflow permissions block — no PAT plumbing or login secrets.
- docker/setup-qemu-action and docker/setup-buildx-action handle the
  multi-arch cross-build setup that Woodpecker would require manual
  host configuration for.

Trigger: any tag matching refs/tags/v*. Mirror sync from somegit.dev
propagates tags to GitHub, so 'git push origin v0.3.1' on the canonical
remote still drives the GitHub-side release.
2026-05-24 16:45:17 +02:00
vikingowl 047924da2b ci(woodpecker): release pipeline on vX.Y.Z tag
Runs 'go test ./...' then 'goreleaser release --clean' inside the
official goreleaser image when a tag matching refs/tags/v* is pushed.
GITHUB_TOKEN comes from the 'github_token' repo secret (needs repo +
write:packages scopes) and is reused for ghcr.io docker login so the
multi-arch image build can push.

Runner requirements documented inline: docker socket access plus QEMU
registered on the host (tonistiigi/binfmt --install all) for arm64
cross-builds. Directory form chosen so a non-release CI pipeline can
land later under .woodpecker/ci.yml without restructuring.
2026-05-24 16:38:24 +02:00
vikingowl a23eb6b92c style: gofmt drift from prior commits
Pure whitespace cleanup surfaced when 'make check' ran gofmt over the
tree. Mostly struct-field column alignment in internal/safety/banner.go
(SessionInfo) and the var(...) flag block in cmd/gnoma/main.go after
--dangerously-allow-anywhere was added without realignment. Verified
zero substantive changes via 'git diff --ignore-all-space
--ignore-blank-lines'.
2026-05-24 16:33:17 +02:00
vikingowl 0981fb82d6 chore(make): add govulncheck and semgrep to 'make check'
Both checks already passed locally on the current dev tip; wiring them
into the canonical pre-commit gate so security regressions fail fast
instead of leaking into a release.

- 'make vuln' runs govulncheck with reachability analysis against the
  Go vuln DB.
- 'make sec' runs semgrep with p/golang + p/security-audit, metrics
  off, --error so findings exit non-zero.

Tools must be installed locally (commands in Makefile comments). If
upstream Woodpecker CI runs 'make check', it will need both binaries
on the runner image.
2026-05-24 16:30:54 +02:00
vikingowl 3888966e68 fix(deps): bump golang.org/x/net to v0.55.0 to clear reachable CVEs
govulncheck flagged two reachable vulnerabilities in
golang.org/x/net@v0.52.0:

- GO-2026-5026 (idna fails to reject ASCII-only Punycode labels),
  reached via router.DiscoverOllama -> http.Client.Do -> idna.ToASCII.
- GO-2026-4918 (HTTP/2 transport infinite loop on bad
  SETTINGS_MAX_FRAME_SIZE), same call path -> http2.Transport.*.

Bumping to v0.55.0 covers both. Transitive bumps to x/crypto v0.51.0,
x/sys v0.45.0, x/text v0.37.0. Post-bump govulncheck reports 0
reachable vulnerabilities and 0 in directly imported packages.
2026-05-24 16:27:28 +02:00
vikingowl 847cd5fe0c fix(security): use crypto/rand for session-ID suffix
Semgrep flagged math/rand for the /tmp artifact-directory session-ID
generation. Modern Go (1.20+) auto-seeds the global math/rand source
so this wasn't exploitable in practice, but crypto/rand is the
idiomatic choice for any security-adjacent identifier and removes the
finding from future security audits.

Drops the mrand alias entirely; reads 8 random bytes once and masks
to 24 bits to preserve the existing %06x suffix format.
2026-05-24 16:22:50 +02:00
vikingowl 001865f069 fix(env): correct ANTHROPIC_API_KEY typo, add missing vars
The placeholder ANTHROPICS_API_KEY (with trailing S) silently failed:
the auth layer reads ANTHROPIC_API_KEY, so anyone copying .env.example
to .env and pasting their key would see gnoma never pick it up, with
no clear error.

Also surfaces vars that already work but weren't templated:
GOOGLE_API_KEY (alternative to GEMINI_API_KEY), GNOMA_PROVIDER and
GNOMA_MODEL (config overrides), and the two subprocess sandbox bypass
footguns (GNOMA_AGY_BYPASS_PERMISSIONS, GNOMA_CODEX_BYPASS_SANDBOX),
left commented out so they don't accidentally turn on.
2026-05-24 16:16:39 +02:00
vikingowl c1c52f139d docs(readme): add 'no phone-home' bullet and data-flow scope note
Clarify that gnoma itself emits no telemetry to external services
while being explicit that cloud-provider arms send data to those
providers by design. Adds:
- 'No phone-home' bullet to the differentiator list, naming the
  on-device path (Ollama/llama.cpp + --incognito).
- 'Data flow' paragraph to the Security scope-note blockquote so
  the framing is consistent between the hero bullets and the
  Security section.
2026-05-24 16:00:40 +02:00
vikingowl 7040041f13 docs(readme): correct firewall scope; track egress controls in TODO
The 'What makes gnoma different' bullet and Security section both
implied a network-egress firewall. Today the Firewall only enforces a
content boundary (secret scan, Unicode sanitize, redact/block). Reword
both spots and add a Scope note. Surface the gap as a top-of-TODO
entry covering per-session audit log and per-host egress allowlist,
with the open design question (host-level vs per-tool) called out.
Raised via r/SideProject v0.3.0 launch thread.
2026-05-24 15:50:35 +02:00
vikingowl 1828151162 docs(claude): big-picture architecture and expanded test commands
Add a 'Big picture' section summarising the request flow (cmd →
session → engine → router → security/permission → extensibility) so
future Claude Code instances can orient without reading INDEX.md plus
five package directories first. Note that internal/safety and
internal/slm aren't in INDEX.md yet. Document the somegit.dev /
GitHub mirror split and the ruleset that blocks force-push and
deletion on main/dev. Expand build/test section with make check, make
test-integration, single-test, and benchmark commands.
2026-05-24 15:39:23 +02:00
vikingowl b5062d59e9 docs(readme): hero screenshot, differentiators, status, TOC
Add docs/img/gnoma-tui.png as a hero image so visitors see the TUI
above the fold instead of a wall of text. Pull the bandit router,
prefer-policy, SLM, and built-in firewall out of buried sections into
a 'What makes gnoma different' bullet list. Add a Status block flagging
pre-1.0 and a table of contents. Move the pygmy-owl naming note and
upstream/mirror URLs into a footer About section.
2026-05-24 15:39:14 +02:00
vikingowl b13a6a2801 docs(plans): mark v0.3.0 plans shipped
Three plans shipped end-to-end in v0.3.0; removing them from
TODO.md In-flight and adding a Status: shipped header to each
plan doc with the commit references.

Shipped:
- 2026-05-23-routing-defaults-refresh.md
- 2026-05-23-prefer-routing-policy.md
- 2026-05-23-startup-safety-banner.md

Still in flight (telemetry-gated, fires only if measurements
support it):
- 2026-05-23-tool-router-specialization.md
2026-05-23 22:45:05 +02:00
36 changed files with 1689 additions and 164 deletions
+13 -2
View File
@@ -1,4 +1,15 @@
MISTRAL_API_KEY="asd**"
ANTHROPICS_API_KEY="sk-ant-**"
# --- LLM provider keys (set at least one) ---
ANTHROPIC_API_KEY="sk-ant-**"
OPENAI_API_KEY="sk-proj-**"
GEMINI_API_KEY="AIza**"
# Alternative to GEMINI_API_KEY (either is accepted)
# GOOGLE_API_KEY="AIza**"
MISTRAL_API_KEY="**"
# --- Optional overrides (config can also set these) ---
# GNOMA_PROVIDER="anthropic"
# GNOMA_MODEL="claude-sonnet-4-6"
# --- Subprocess sandbox bypass (footguns — set deliberately) ---
# GNOMA_AGY_BYPASS_PERMISSIONS=1
# GNOMA_CODEX_BYPASS_SANDBOX=1
+68
View File
@@ -0,0 +1,68 @@
# Release workflow — runs when a vX.Y.Z tag is pushed (including mirror
# pushes from somegit.dev). Drives GoReleaser to publish:
# - static binaries (linux/darwin/windows × amd64/arm64) + checksums
# + autogenerated changelog to the GitHub releases page
# - multi-arch container images to ghcr.io/vikingowl91/gnoma
#
# GITHUB_TOKEN is provided automatically by GitHub Actions and already
# carries packages:write thanks to the permissions block, so no PAT is
# needed for either the release upload or the ghcr.io push.
#
# Security note: this workflow does not interpolate any untrusted
# context (commit messages, PR titles, issue bodies) into shell commands.
# All ${{ ... }} references live in with: / env: blocks, which are
# safely passed as strings rather than evaluated as shell.
name: Release
on:
push:
tags:
- "v*"
permissions:
contents: write
packages: write
jobs:
release:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Setup Go
uses: actions/setup-go@v5
with:
go-version: "1.26"
- name: Setup QEMU
uses: docker/setup-qemu-action@v3
- name: Setup Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to GHCR
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Test
run: go test ./...
- name: GoReleaser
uses: goreleaser/goreleaser-action@v6
with:
version: latest
args: release --clean
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
# Force GoReleaser to use the triggering tag rather than fall
# back to `git describe` — which can resolve to an older tag
# (e.g., a vX.Y.Z-rc tag) when multiple tags point at the same
# commit. Surfaced as the v0.3.1 release failure on 2026-05-24.
GORELEASER_CURRENT_TAG: ${{ github.ref_name }}
+9 -3
View File
@@ -37,9 +37,12 @@ changelog:
sort: asc
filters:
exclude:
- "^docs:"
- "^test:"
- "^chore:"
# Match both bare and scoped conventional commits, e.g. both
# "docs:" and "docs(readme):" should be excluded.
- "^docs[:(]"
- "^test[:(]"
- "^chore[:(]"
- "^style[:(]"
# Multi-arch Docker images published to GitHub Container Registry.
# Build host needs Docker buildx and a `docker login ghcr.io` for the
@@ -98,3 +101,6 @@ release:
github:
owner: VikingOwl91
name: gnoma
# Auto-detect prereleases from semver: tags with -rc, -beta, -alpha,
# -pre, etc. suffix get marked as prerelease on GitHub.
prerelease: auto
+50 -10
View File
@@ -5,20 +5,60 @@ Provider-agnostic agentic coding assistant in Go 1.26.
Named after the northern pygmy-owl (Glaucidium gnoma).
Agents are called "elfs" (elf owl).
## Module
`somegit.dev/Owlibou/gnoma`
## Module & repo layout
- Module: `somegit.dev/Owlibou/gnoma`
- Upstream (primary, accepts PRs): <https://somegit.dev/Owlibou/gnoma>
- GitHub mirror (read-only): <https://github.com/VikingOwl91/gnoma>
PRs go to the upstream Gitea instance, not GitHub. The GitHub side is a
push mirror — direct pushes to `main`/`dev` there will be rejected by the
ruleset.
## Big picture (read this before diving in)
Single static Go binary. Request flow:
1. `cmd/gnoma` parses flags, picks TUI vs pipe mode, builds the session.
2. `internal/session` owns one chat lifecycle; `internal/engine` runs the
agentic loop (stream → tool calls → re-query → until done).
3. `internal/router` picks the arm per prompt: multi-armed bandit over
provider adapters in `internal/provider/{anthropic,openai,google,mistral,openaicompat}`,
tiered SLM (`internal/slm`) → CLI-agent subprocess → local → cloud,
with `Strengths` + `MaxComplexity` + `CostWeight` shaping selection.
4. `internal/security` is the safety boundary: SafeProvider wrapping,
firewall (network egress), secret scanner, redaction, incognito mode.
`internal/safety` is separate — it's the pre-launch CWD classifier.
5. `internal/tool` is the local-action boundary; `internal/permission`
gates every tool call.
6. Extensibility surfaces: `internal/hook`, `internal/skill`,
`internal/mcp` (JSON-RPC over stdio), `internal/plugin` (TOFU-pinned).
Discriminated unions (struct + type discriminant) are the project's
chosen way to model variants — see `internal/message` and
`internal/stream`. Don't reach for interfaces when a discriminant fits.
Full essentials (vision, domain model, ADRs, process flows):
`docs/essentials/INDEX.md`. **Read INDEX.md before changing
architectural boundaries or adding new packages.** Note: INDEX
predates `internal/safety` and `internal/slm` — cross-check the actual
tree.
## Build & Test
```sh
make build # build binary to ./bin/gnoma
make test # run all tests
make lint # run golangci-lint
make cover # test with coverage report
```
make build # ./bin/gnoma
make test # unit tests
make test-integration # //go:build integration — needs real API keys
make lint # golangci-lint run ./...
make check # fmt + vet + lint + test — canonical pre-commit gate
make cover # coverage.html
## Project Essentials
Project architecture, domain model, and design decisions: `docs/essentials/INDEX.md`
Read INDEX.md before making architectural changes or adding new system boundaries.
# Run a single test / package
go test -run TestRouterSelect ./internal/router/
go test -v ./internal/router/
# Benchmarks
go test -bench=. ./internal/router/
```
## Conventions
+12 -2
View File
@@ -1,4 +1,4 @@
.PHONY: build run check install test lint cover clean fmt vet
.PHONY: build run check install test lint cover clean fmt vet vuln sec
BINARY := gnoma
BINDIR := ./bin
@@ -10,7 +10,7 @@ build:
run: build
$(BINDIR)/$(BINARY)
check: fmt vet lint test
check: fmt vet lint test vuln sec
@echo "All checks passed!"
install:
@@ -43,3 +43,13 @@ clean:
tidy:
go mod tidy
# Reachability-checked dependency vuln scan against the Go vuln DB.
# Install: go install golang.org/x/vuln/cmd/govulncheck@latest
vuln:
govulncheck ./...
# Static security analysis via Semgrep (Go ruleset + security-audit).
# Install: pip install semgrep (or: brew install semgrep)
sec:
semgrep --config=p/golang --config=p/security-audit --metrics=off --error .
+86 -7
View File
@@ -10,11 +10,65 @@ to the best available model — cloud or local — through a multi-armed bandit
router, executes tools on your behalf, and stays extensible through hooks,
skills, MCP servers, and plugins.
Named after the northern pygmy-owl (*Glaucidium gnoma*); agents are called
**elfs** (elf owl).
![gnoma TUI showing a routed turn](docs/img/gnoma-tui.png)
- **Upstream:** <https://somegit.dev/Owlibou/gnoma>
- **GitHub mirror:** <https://github.com/VikingOwl91/gnoma>
*Every turn shows which arm the router picked and why — here a local
`qwen3:14b` was selected for a `generation` task.*
## What makes gnoma different
- **Multi-armed bandit router.** Per-prompt arm selection based on
capability gates, declared `Strengths`, latency, and cost. Visible in
the TUI on every turn — no black box.
- **`[router].prefer = local | cloud | auto`.** Pin routing toward local
models, cloud, or let the bandit decide. Offline-first workflows still
reach for Claude when the local model would obviously flail.
- **Tier-0 SLM routing.** A tiny local model classifies each prompt and
handles trivial tasks itself, keeping the heavy provider for real work.
- **Content boundary + secret scanner.** Every outgoing LLM message
and incoming tool result is scanned for secrets (regex + Shannon
entropy on long tokens), redacted or blocked at the content level.
Paths are canonicalised (TOCTOU-safe), Unicode is sanitized
(homoglyphs, BiDi tricks), and a `SafeProvider` boundary keeps
incognito-mode data out of long-lived stores. *(Per-host network
egress allowlist is on the roadmap, not in place today.)*
- **No phone-home.** gnoma itself sends nothing off-machine — zero
analytics endpoint, zero metrics service, no remote logging.
Prompts of course go to whatever provider you route them to:
cloud arms ship data to that provider by design; pair
Ollama/llama.cpp with `--incognito` if you want everything
on-device.
- **Provider-agnostic from day one.** Anthropic, OpenAI, Google, Mistral,
Ollama, llama.cpp, plus subprocess CLIs (`claude`, `codex`, `agy`,
`vibe`). Mix cloud and local in the same session.
- **Vision end-to-end.** `[Image: /path]` markers in prompts, `Ctrl+V`
paste in the TUI, capability-gated per arm.
- **Single static binary.** `CGO_ENABLED=0`, multi-arch container on
ghcr.io. No daemon, no runtime deps.
## Status
Pre-1.0 (current: **v0.3.0**). Single maintainer, breaking changes
possible. The provider, router, and engine surfaces are settling;
config schema and TUI bindings may still shift between minor versions.
Apache 2.0.
## Table of contents
- [Install](#install)
- [Quickstart](#quickstart)
- [Vision / image input](#vision--image-input)
- [Providers](#providers)
- [Config](#config)
- [Routing defaults](#routing-defaults)
- [SLM routing](#slm-small-language-model-routing)
- [Session persistence](#session-persistence)
- [Extensibility](#extensibility)
- [Subcommands](#subcommands)
- [Security](#security)
- [Development](#development)
- [About](#about)
- [License](#license)
---
@@ -418,9 +472,25 @@ built-in batching skill.
gnoma runs tools and shell commands on your behalf. The
[`internal/security`](internal/security) package canonicalises every path
(TOCTOU-safe), gates network access through a configurable firewall, and
scans tool output for secrets before it ever reaches the model. The
`SafeProvider` boundary keeps incognito-mode data out of long-lived stores.
(TOCTOU-safe), scans every outgoing LLM message and incoming tool result
for secrets (regex + Shannon entropy) before it reaches the model, and
sanitizes Unicode (homoglyphs, BiDi tricks). The `SafeProvider` boundary
keeps incognito-mode data out of long-lived stores.
> **Scope note.** The current "firewall" is a content boundary — it
> redacts/blocks secrets in inputs and outputs. It is **not** a
> network-egress firewall: outgoing HTTP from tools and providers goes
> through stock `http.Client`, with no per-host allowlist or
> dial-layer enforcement. Per-host egress rules and a per-session
> audit log of blocked/redacted events are tracked in
> [TODO.md](TODO.md).
>
> **Data flow.** gnoma itself emits no telemetry to external services
> — no analytics, no metrics endpoint, no remote logging. When you
> route to a cloud provider (Anthropic, OpenAI, Google, Mistral),
> prompts and tool data are sent to that provider as required to
> fulfill the request — by design. For fully on-device operation,
> use Ollama or llama.cpp and `--incognito`.
### Entropy false-positive reduction
@@ -498,6 +568,15 @@ Architecture, conventions, and TDD workflow: [CONTRIBUTING.md](CONTRIBUTING.md).
---
## About
Named after the northern pygmy-owl (*Glaucidium gnoma*); agents are called
**elfs** (elf owl).
- **Upstream:** <https://somegit.dev/Owlibou/gnoma>
- **GitHub mirror:** <https://github.com/VikingOwl91/gnoma> (read-only;
PRs go to upstream Gitea)
## License
Apache License 2.0. See [LICENSE](LICENSE) and [NOTICE](NOTICE).
+174 -31
View File
@@ -4,35 +4,171 @@ Active work, newest first.
## In flight
- **Startup safety + context banner** — refuse / warn / OK tier check
on the cwd at launch (refuse in `/etc`, `/sys`, system roots; warn
with keypress in `$HOME`, `/tmp`, common dumping grounds; OK in
anything inside a git repo or with a project marker). Context
banner always shown with cwd, git state, model, modes, and a
top-level sensitive-file inventory. Bypass via
`--dangerously-allow-anywhere`. Complements the in-flight
sensitive-content unified-policy work (this is the pre-flight
layer; that is the runtime layer). See
[`docs/superpowers/plans/2026-05-23-startup-safety-banner.md`](docs/superpowers/plans/2026-05-23-startup-safety-banner.md).
- **Routing-preference policy** — `[router].prefer = "local" | "cloud" | "auto"`
config knob biasing selection via a soft score multiplier
(0.3 / 0.5 / 1.0). Preserves Strengths cross-tier promotion and
the bandit's learning; complements rather than replaces incognito.
Forced arms (`--provider X`) and incognito still take priority.
Closes the original 2026-05-23 session item B (deferred when the
defaults-refresh work landed first). See
[`docs/superpowers/plans/2026-05-23-prefer-routing-policy.md`](docs/superpowers/plans/2026-05-23-prefer-routing-policy.md).
- **Routing defaults refresh** — bake family-keyed `Strengths` +
`MaxComplexity` into discovery so a freshly-pulled local fleet
routes sensibly without any TOML config. Adds a non-chat exclude
list (filters `embeddinggemma`, `kokoros`, `whisper-base`,
`vibevoice`, `*-asr/-tts/-audio/-reranker`), extends
`knownVisionModelPrefixes` (gemma4, glm-ocr), and refreshes the
cloud-side registry (Gemini 3.x, `gpt-5.3-codex`). Closed-model
`Strengths` + `CostWeight` defaults land in the provider modules.
Driven by benchmark snapshot 2026-05-23
(artificialanalysis.ai v4.0, llm-stats.com). See
[`docs/superpowers/plans/2026-05-23-routing-defaults-refresh.md`](docs/superpowers/plans/2026-05-23-routing-defaults-refresh.md).
- **Config write/merge — silent corruption of layered configs.**
`internal/config/write.go:setConfig` reads the existing TOML into a
zero-valued `Config` struct, sets one field, and writes the entire
struct back out — so every untouched field gets serialized at its
Go zero value (empty strings, zero ints, `false` bools). On the
next load, those explicit zeros overwrite higher-priority layers
via `toml.Decode`'s "present field beats absent field" semantics.
Concrete symptom (2026-05-24): user's `~/.config/gnoma/config.toml`
had `[router].prefer = "cloud"` but the project-level
`.gnoma/config.toml` had `prefer = ""` (generated by an earlier
`gnoma config set ...` call), which silently downgraded the
effective policy to `auto` — visible only via the new `/router`
TUI command, with no warning.
Same root cause is responsible for the zero-spammed global config
the same user has (`max_tokens = 0`, `permission.mode = ""`,
`bash_timeout = 0`, etc.) — all overwriting sensible defaults.
**Fix surface (multi-part, plan-worthy):**
1. **Stop generating zero-spam.** Two options:
- Tag struct fields with `,omitempty` so the BurntSushi encoder
skips zero values. Caveat: conflates "unset" with "explicitly
zero" for primitive types (a user who wants `max_keep = 0`
loses it). Safe for strings/maps/slices where empty is never
user-intent; lossy for numeric fields.
- Switch to `pelletier/go-toml/v2` and use its document model
to edit only the targeted key, preserving everything else
byte-for-byte. Cleaner semantics, bigger refactor.
- Hybrid: omitempty on string/map/slice fields, document-level
edit for numerics. Fastest path that doesn't lose intent.
2. **`gnoma doctor` — read-only diagnostic.** Scans both global
and project configs and reports:
- Zero-spam fields that would silently shadow defaults or
upstream layers.
- Invalid enum values (e.g. `permission.mode = ""`).
- Unknown / removed keys from older schema versions.
- Effective-merged values (so the user sees what gnoma will
actually use after layering). No writes. Exits non-zero on
findings so it's CI-friendly.
3. **`gnoma upgrade-config` — active migration.** For each config
file (global, profiles, project):
- Compute the cleaned form (only fields the user actually set,
dropping zeros that match defaults).
- Write the original to `<path>.bak` with timestamp suffix.
- Write the cleaned form to the original path.
- Print a diff of what changed so the user can verify.
4. **Project-level auto-migration on startup.** If gnoma detects
a zero-spammed project `.gnoma/config.toml` at launch:
- Auto-run the upgrade (project-only, never auto-touch the
global config).
- Write `.gnoma/config.toml.bak-YYYY-MM-DD-HHMMSS`.
- Surface a one-line notice in the startup safety banner:
`config: migrated .gnoma/config.toml (see .bak)`.
- The auto-migration is non-destructive (`.bak` preserves
original) but still gated behind a `[config].auto_migrate`
toggle, defaulting to `true`. Global configs require
explicit `gnoma upgrade-config`.
5. **Project registry** (`~/.config/gnoma/projects.json`). Today
there is no record of which directories gnoma has been launched
in — items #2 and #3 can work with a filesystem scan
(`find ~ -type d -name .gnoma`), but a registry makes them
significantly faster and unlocks cross-project features.
Sketch:
```json
{
"projects": [
{
"path": "/home/.../my-repo",
"first_seen": "2026-04-15T10:30:00Z",
"last_seen": "2026-05-24T19:23:00Z",
"session_count": 47
}
]
}
```
Update on every successful startup (record project root,
bump `last_seen` + increment `session_count`). Enables:
- Fast `gnoma doctor --all-projects` without a filesystem walk.
- Cross-project session listing (`gnoma sessions --all`
picker; surface most-recent sessions across the registry).
- `gnoma upgrade-config` that can migrate every known project
in one invocation.
- Future local-only aggregate stats (`gnoma stats`) — still
no-phone-home, just a sum across the registry.
**Caveats and design constraints:**
- The registry file becomes another silent-corruption surface
— must use the same `omitempty` / atomic-write discipline
as the encoder fix in #1, or it'll exhibit the same class
of bug.
- Stale entries (deleted projects). `gnoma doctor` should
detect and offer to prune; do not auto-delete.
- Privacy: this is literally a log of directories the user
has worked in. Local-only, never sent off-machine (per the
no-phone-home positioning), but worth a one-line note in
the Security section of the README so users know it exists.
- Opt-out: `[config].project_registry = false` for users who
don't want this tracked. Default `true`.
- Atomic writes (temp file + rename) so a crash mid-write
doesn't corrupt the file.
Surfaced from the v0.3.1 launch wave (2026-05-24).
Plan:
[`docs/superpowers/plans/2026-05-24-config-migration.md`](docs/superpowers/plans/2026-05-24-config-migration.md).
- **Bandit selector — design decisions deferred.** The current
selector (`internal/router/selector.go:scoreArm`) is greedy
quality-weighted: per-(arm × task-type) EMA scores blended 70/30
with heuristic defaults, divided by CostWeight-adjusted cost. It
is **not** a true multi-armed bandit — no UCB-style exploration
bonus, no Thompson sampling. Tracked as a design question rather
than a must-implement item because of two open dependencies:
1. **Whether to keep numeric EMA at all.** The 2026-05-07 roadmap
(Phase 4) puts re-evaluating bandit learning on hold until the
SLM-driven dispatcher is in production. Three options on the
table: keep bandit as feedback for the SLM, retire EMA in
favour of qualitative outcome summaries fed to the SLM, or
split responsibilities (SLM = intent routing, bandit =
cost/quality within a tier). See
[`docs/superpowers/plans/2026-05-07-gnoma-roadmap.md`](docs/superpowers/plans/2026-05-07-gnoma-roadmap.md)
§Phase 4.
2. **User-tunable selector knobs.** Several constants are
hardcoded today: `qualityAlpha` (EMA smoothing, ~3-sample
memory), the 70/30 observed/heuristic blend,
`strengthScoreBonus` for tagged task types, and the
`DefaultThresholds.Minimum` quality floor. Surfacing these as
`[router.bandit]` config keys would let users tune for their
workloads (faster alpha for shifting model performance, longer
memory for stable fleets) without waiting for the strategic
decision in #1.
Surfaced from the r/coolgithubprojects v0.3.1 launch thread
(2026-05-24, `u/Ha_Deal_5079`).
- **Security boundary — egress controls + session audit log.** The
current `Firewall` is a content boundary only (scans messages and
tool results for secrets via regex + Shannon entropy, redacts or
blocks, logs via `log/slog`). It does not enforce network egress —
outgoing HTTP from tools and providers uses stock `http.Client`
with no per-host allowlist or dial-layer interception. Two follow-
ups surfaced from the r/SideProject v0.3.0 launch thread
(2026-05-24, `u/Secret_Theme3192`):
1. **Per-session audit log of blocked/redacted events** —
grep-able file at `.gnoma/sessions/<id>/audit.jsonl` so the
user can answer "what did the firewall do this session?" in
one command. Today the `slog` output goes to whatever sink is
configured, with no per-session grouping.
2. **Per-host egress allowlist (HTTP transport layer)** — open
design question: host-level (`allow api.openai.com, deny *`)
vs per-tool (`bash can only hit these hosts`). Reply asked
the commenter for their mental model; revisit when feedback
lands. The README and v0.3.0 Reddit post phrasing oversold
"network egress gated"; corrected in the same commit as this
TODO entry.
- **Tool-router specialization (functiongemma)** — gated on telemetry,
not committed. Phase A.2 adds did-switch-rate measurement to the
two-stage `select_category` path; Phase A.3 (LoRA fine-tune of
@@ -65,7 +201,8 @@ Active work, newest first.
warning when the content matches sensitive heuristics, a
consent-gated review step, and consistent treatment across the
three paths. Cross-cuts with Phase F entropy work and the
outgoing-scan firewall.
outgoing-scan firewall. Plan:
[`docs/superpowers/plans/2026-05-24-sensitive-content-policy.md`](docs/superpowers/plans/2026-05-24-sensitive-content-policy.md).
- **Distribution — follow-ups.** v0.1.0 shipped (archives on
github.com/VikingOwl91/gnoma/releases, multi-arch images on
ghcr.io/vikingowl91/gnoma). Still optional: Homebrew tap,
@@ -84,7 +221,13 @@ Active work, newest first.
- **Structured output** with JSON schema validation — M12.
- **Native agy JSON output** — switch the subprocess provider to
`--output-format stream-json` once the agy CLI supports it,
replacing the current prompt-augmentation fallback.
replacing the current prompt-augmentation fallback. Until then,
agy's `ToolUse` capability is set to `false` (see
`internal/provider/subprocess/agent.go` agy entry) — without
structured tool-call output, the router would otherwise dispatch
tool-needing tasks to agy and the turn would hang on prose
hallucinations of tool calls. Flip the capability back to `true`
in the same change that lands stream-json parsing.
- **SQLite session persistence** + serve mode — M10.
- **Task learning** (pattern recognition, persistent tasks) — M11.
- **Web UI** (`gnoma web`) — M15.
+40 -14
View File
@@ -2,13 +2,14 @@ package main
import (
"context"
"crypto/rand"
"encoding/binary"
"encoding/json"
"errors"
"flag"
"fmt"
"io"
"log/slog"
mrand "math/rand"
"os"
"os/signal"
"path/filepath"
@@ -61,17 +62,17 @@ var (
func main() {
var resumeFlag string
var (
providerName = flag.String("provider", "", "LLM provider (mistral, anthropic, openai, google, ollama, llamacpp)")
model = flag.String("model", "", "model name (empty = provider default)")
system = flag.String("system", "", "system prompt override (empty = built-in default)")
apiKey = flag.String("api-key", "", "API key (or set MISTRAL_API_KEY env)")
maxTurns = flag.Int("max-turns", 50, "max tool-calling rounds per turn")
permMode = flag.String("permission", "auto", "permission mode (default, accept_edits, bypass, deny, plan, auto)")
incognito = flag.Bool("incognito", false, "incognito mode — no persistence, no learning")
profileFlag = flag.String("profile", "", "config profile to load (empty = default_profile from base config)")
providerName = flag.String("provider", "", "LLM provider (mistral, anthropic, openai, google, ollama, llamacpp)")
model = flag.String("model", "", "model name (empty = provider default)")
system = flag.String("system", "", "system prompt override (empty = built-in default)")
apiKey = flag.String("api-key", "", "API key (or set MISTRAL_API_KEY env)")
maxTurns = flag.Int("max-turns", 50, "max tool-calling rounds per turn")
permMode = flag.String("permission", "auto", "permission mode (default, accept_edits, bypass, deny, plan, auto)")
incognito = flag.Bool("incognito", false, "incognito mode — no persistence, no learning")
profileFlag = flag.String("profile", "", "config profile to load (empty = default_profile from base config)")
allowAnywhere = flag.Bool("dangerously-allow-anywhere", false, "bypass the cwd safety classifier — only use if you know what you're doing")
verbose = flag.Bool("verbose", false, "enable debug logging")
version = flag.Bool("version", false, "print version and exit")
verbose = flag.Bool("verbose", false, "enable debug logging")
version = flag.Bool("version", false, "print version and exit")
)
flag.StringVar(&resumeFlag, "resume", "", "resume session by ID (omit ID to list sessions)")
flag.StringVar(&resumeFlag, "r", "", "resume session (shorthand)")
@@ -396,7 +397,17 @@ func main() {
// Create router and register the provider as a single arm
// (M4 foundation: one provider from CLI. Multi-provider routing comes with config.)
rtr := router.New(router.Config{Logger: logger})
// BanditParams come from [router.bandit] config keys; zero values
// resolve to built-in defaults inside the router package.
rtr := router.New(router.Config{
Logger: logger,
Bandit: router.BanditParams{
QualityAlpha: cfg.Router.Bandit.QualityAlpha,
MinObservations: cfg.Router.Bandit.MinObservations,
ObservedWeight: cfg.Router.Bandit.ObservedWeight,
StrengthBonus: cfg.Router.Bandit.StrengthBonus,
},
})
// Apply the prefer-routing-policy from config (default: auto).
// Invalid values are rejected here with an actionable error rather
@@ -656,10 +667,14 @@ func main() {
}
permChecker := permission.NewChecker(permission.Mode(*permMode), permRules, pipePromptFn)
// Generate session-scoped ID for /tmp artifact directory
// Generate session-scoped ID for /tmp artifact directory.
// Use crypto/rand so the suffix isn't predictable even if a future
// caller seeds math/rand deterministically (e.g., in tests).
var randBuf [8]byte
_, _ = rand.Read(randBuf[:])
sessionID := fmt.Sprintf("%s-%06x",
time.Now().Format("20060102-150405"),
mrand.Int63()&0xffffff,
binary.BigEndian.Uint64(randBuf[:])&0xffffff,
)
// Pass the firewall's incognito mode so Save no-ops while incognito
// is active. Mode is consulted on every Save (dynamic), so TUI
@@ -667,6 +682,17 @@ func main() {
store := persist.New(sessionID, fw.Incognito())
logger.Debug("session store initialized", "dir", store.Dir())
// Per-session firewall audit log: append-only JSONL at
// <projectRoot>/.gnoma/sessions/<sessionID>/audit.jsonl. Honours
// incognito (writes skipped when active) and tolerates fs errors —
// scan pipeline never depends on the audit succeeding.
auditPath := filepath.Join(gnomacfg.ProjectRoot(), ".gnoma", "sessions", sessionID, "audit.jsonl")
fw.SetAudit(security.NewAuditLogger(security.AuditLoggerConfig{
Path: auditPath,
Incognito: fw.Incognito(),
Logger: logger,
}))
// Create elf manager and register agent tools.
// Must be created after fw and permChecker so elfs inherit security layers.
elfMgr := elf.NewManager(elf.ManagerConfig{
Binary file not shown.

After

Width:  |  Height:  |  Size: 306 KiB

@@ -1,5 +1,10 @@
# Routing-Preference Policy — 2026-05-23
> **Status: shipped in v0.3.0.** Commit `f9094f6`. Implementation
> diverged from the original plan (tier-shift instead of pure score
> multiplier) — see "Implementation note" in the Approach section.
> All P-1 through P-7 tasks complete.
Adds a config knob that biases routing toward local arms, toward
cloud arms, or leaves the current tier+score behavior unchanged.
Originally surfaced as item B in the 2026-05-23 routing redesign
@@ -1,5 +1,10 @@
# Routing Defaults Refresh — 2026-05-23
> **Status: shipped in v0.3.0.** Commits `a79e991` (scaffold) →
> `9bb775a` (full local family table) → `2f8d4c4` (cloud defaults
> + gpt-5.3-codex) → `c99b2c6` (README). All R-1 through R-8
> tasks complete.
Refreshes gnoma's per-arm routing defaults so that out-of-the-box
selection produces sensible choices without requiring users to write
a `[[arms]]` block in TOML. Surfaced during the 2026-05-23 session
@@ -1,5 +1,11 @@
# Startup Safety + Context Banner — 2026-05-23
> **Status: shipped in v0.3.0.** Commits `3eeb5b4` (classifier +
> banner + main.go wiring) → `8ba77c1` (env-template precision
> fix, label alignment, banner-under-bypass). All S-1 through
> S-7 tasks complete; S-8 docs done in `d206b3c`. Windows path
> handling still deferred per plan.
Adds a pre-launch safety check that warns or refuses when gnoma is
started in a directory where it could do real damage (`$HOME`,
`/`, `/etc`, etc.), plus a context banner shown on every launch
@@ -0,0 +1,356 @@
# Config Migration — 2026-05-24
Fixes the silent-corruption pattern in `internal/config/write.go`
that produces zero-spammed config files, adds reader-side telemetry
to surface the resulting layering bugs (`gnoma doctor`), ships an
active migration command (`gnoma upgrade-config`), wires automatic
project-level migration on startup, and introduces a per-user
project registry so all of the above can operate cross-project.
Surfaces in TODO.md as "Config write/merge — silent corruption of
layered configs" with five sub-items; this plan promotes that entry
out of the bullet form into a phased design.
---
## Problem
`setConfig()` in `internal/config/write.go` reads the existing TOML
into a zero-valued `Config` struct, mutates one field, and writes
the entire struct back out. The encoder doesn't skip zero values,
so every untouched field gets serialized at its Go default — empty
strings, zero ints, `false` bools, empty maps.
The next layered load (`Load()``toml.Decode` over multiple
files) then **does not** treat those present-but-zero fields as
"unset" — TOML's "present field wins" semantics mean those zeros
overwrite higher-priority layers. Concrete failure observed
2026-05-24:
- User's global `~/.config/gnoma/config.toml` has
`[router].prefer = "cloud"`.
- An earlier `gnoma config set ...` call generated a project-level
`.gnoma/config.toml` containing `[router].prefer = ""`.
- The merge collapses to `Prefer = ""`, which
`ParsePreferPolicy("")` maps to `PreferAuto`.
- The TUI's `/router` command reads `auto` despite the global
config saying `cloud`. No warning, no error — purely silent.
Same root cause produces zero-spammed global configs
(`max_tokens = 0`, `permission.mode = ""`, etc.) that silently
override sensible defaults in `internal/config/defaults.go`.
This affects every layered field — provider, permission, tools,
session, router, security, slm. Cannot be patched per-field;
needs a structural fix.
---
## Non-goals
- **Schema redesign.** The current `Config` struct stays as-is.
This plan addresses how it's written and read, not what fields
exist.
- **Validation.** Future work; `gnoma doctor` will flag obviously
invalid values (empty enum strings, etc.) but a full validation
pass against the schema is out of scope here.
- **Migration of the bandit-router quality JSON.** Unrelated file,
unrelated format, separate concerns.
---
## Approach overview
Five phases, in dependency order:
1. **Encoder fix** — stop generating zero-spam in the first place.
2. **Project registry**`~/.config/gnoma/projects.json` so later
phases can operate cross-project without filesystem walks.
3. **`gnoma doctor`** — read-only diagnostic, scans global +
project configs (via registry), reports zero-spam, invalid
enums, removed keys, and the effective-merged view.
4. **`gnoma upgrade-config`** — active migration with `.bak`
backup + diff output; targets one file or all known projects.
5. **Auto-migration on startup** — when launch detects a
zero-spammed project config, run upgrade-config silently with
a banner-line notice.
Phases 1 + 2 land first. 3 builds on 1 + 2. 4 builds on 3. 5
builds on 4.
---
## Phase 1 — Encoder fix
`setConfig()` is the bug generator. The TOML library
(`BurntSushi/toml`) supports `omitempty` on struct tags but the
project's `Config` struct doesn't use it. Three options:
### Option A — `omitempty` on all fields
Tag every field with `,omitempty`. The encoder skips fields at
their Go zero value. **Caveat:** conflates "unset" with
"explicitly zero" for primitive types — a user who actually
wants `max_keep = 0` (no session retention) loses that setting on
the next write.
### Option B — `pelletier/go-toml/v2` document model
Switch encoder to a TOML library that exposes a document AST.
Edit only the targeted key, preserve everything else byte-for-byte.
Cleaner semantics, bigger refactor — also affects the decoder side.
### Option C (chosen) — hybrid
Use `omitempty` for fields where the Go zero value is never
user-intent (strings, maps, slices). For numeric fields where 0
is a legitimate user choice, switch the field to a pointer
(`*int`, `*float64`) so `nil` means "unset" and `*0` means
"explicitly zero". On decode, fall back to defaults for nil
pointers in the resolution layer.
This keeps the existing BurntSushi library, preserves user intent
across the full type space, and limits churn to the fields where
the zero/unset ambiguity actually matters.
### Phase 1 task list
- **P1-1:** Audit every `Config`-tree field. Tag string/map/slice
fields with `,omitempty`. List numeric/bool fields that need
pointer conversion.
- **P1-2:** Convert numeric/bool fields requiring zero-vs-unset
distinction to pointers. Update construction sites and getters.
- **P1-3:** Add a `Resolve()` method on `Config` that walks the
struct and substitutes default values for nil pointers, called
exactly once at the end of `Load()`. All consumer code reads
resolved values; raw layered structs are internal.
- **P1-4:** Tests covering: (a) write-then-read roundtrip
preserves only user-set fields, (b) explicit zero (e.g.
`max_keep = 0`) survives the roundtrip, (c) field absent from
TOML resolves to default.
- **P1-5:** Backwards-compat: when reading an existing zero-spammed
file, the resolver must treat all-zeros-in-a-section as the
default — see Phase 5 for the heuristic.
---
## Phase 2 — Project registry
New file at `~/.config/gnoma/projects.json`:
```json
{
"projects": [
{
"path": "/home/user/git/foo",
"first_seen": "2026-04-15T10:30:00Z",
"last_seen": "2026-05-24T19:23:00Z",
"session_count": 47
}
]
}
```
### Phase 2 task list
- **P2-1:** Add `internal/config/registry.go` with `Registry`,
`Load`, `Save`, `Record(projectRoot)`, `Prune(staleAfter time.Duration)`.
- **P2-2:** Save uses atomic-write (temp file + `os.Rename`) so a
crash mid-write doesn't corrupt the file.
- **P2-3:** Call `Registry.Record(projectRoot)` from
`cmd/gnoma/main.go` right after the startup-safety banner
decides to proceed. Failure is logged at Warn level but never
blocks startup.
- **P2-4:** Add `[config].project_registry` toggle in defaults.go
(bool, default `true`). When `false`, Record is a no-op.
- **P2-5:** Document the file in README §Security as part of the
no-phone-home scope note: this is purely local, never sent.
- **P2-6:** Tests: round-trip, atomic-write under fault injection,
toggle off path.
---
## Phase 3 — `gnoma doctor`
New subcommand. Read-only. Scans:
- Global config at `GlobalConfigPath()`.
- Every project in the registry (or filesystem-scan fallback when
the registry is disabled or empty).
- Active profile (when profile mode is on).
Reports per-file:
- **Zero-spam fields** — present-with-zero where higher layer or
default has non-zero. The very thing this plan exists to fix.
- **Invalid enum values** — `permission.mode = ""`,
`router.prefer = "yes"`, etc. Use existing parsers to detect.
- **Unknown keys** — fields in the TOML that don't map to any
`Config` struct field. Decoder ignores these silently today;
doctor surfaces them.
- **Removed keys** — known-historical fields from older schema
versions; suggest removal.
Reports per-stack:
- **Effective-merged values** — what gnoma will actually use after
layering. Helps the user see whether a project file is masking
a global setting.
### Phase 3 task list
- **P3-1:** Add `cmd/gnoma/doctor_cmd.go` with the subcommand
scaffold.
- **P3-2:** `internal/config/doctor.go` with the scan logic;
exported `Diagnose(paths []string) []Finding`.
- **P3-3:** Output: human format by default, `--json` for
CI/script consumption.
- **P3-4:** Exit non-zero when findings have severity ≥ Warn so
doctor is CI-friendly.
- **P3-5:** `--all-projects` flag (default off; uses registry).
- **P3-6:** Tests covering each finding type.
---
## Phase 4 — `gnoma upgrade-config`
Active migration. Writes:
- Original file → `<path>.bak-YYYYMMDD-HHMMSS` (deterministic
timestamp suffix).
- Cleaned content → original path.
- Stdout: unified diff of what changed.
### Phase 4 task list
- **P4-1:** Add `cmd/gnoma/upgrade_config_cmd.go`.
- **P4-2:** `internal/config/upgrade.go` with `Upgrade(path string)`
→ reads file, applies the Phase 1 cleaning (drop fields equal to
their resolved default, keep explicit zeros that diverge from the
default via the pointer semantics).
- **P4-3:** Atomic two-step write: rename original to `.bak-...`,
then atomic-write new content to original path. Crash midway
leaves both files present, never the corrupted state.
- **P4-4:** `--all-projects` flag using the registry.
- **P4-5:** `--dry-run` prints diffs without writing.
- **P4-6:** Tests: round-trip of zero-spammed input → cleaned
output → identical re-read; idempotency (running twice yields
no second `.bak`).
---
## Phase 5 — Auto-migration on startup
When `Load()` parses a project `.gnoma/config.toml` and the
heuristic flags it as zero-spammed (every field at the Go zero
value, no user content), gnoma:
- Runs the Phase 4 upgrade in-process.
- Writes `.gnoma/config.toml.bak-...`.
- Emits a single line to the startup safety banner:
`config: migrated .gnoma/config.toml (see .bak)`.
- Continues startup with the cleaned config.
### Heuristic for "zero-spam"
A config section is zero-spam if **all** of these hold:
- Every primitive field present in the file is at its Go zero
value.
- No `[[arms]]`, `[[mcp_servers]]`, or `[[hooks]]` blocks (those
are always user content).
- File modification time ≥ 24h old (so we don't migrate a config
the user is actively editing).
If only some fields are zero and some are user-set, we don't touch
it — the user's mix of explicit zeros and meaningful values takes
precedence.
### Phase 5 task list
- **P5-1:** Add `isZeroSpam(*Config) bool` heuristic in
`internal/config/upgrade.go`.
- **P5-2:** Wire from `Load()` post-merge: if project layer
is_zero_spam → call Upgrade on the project file, log via banner.
- **P5-3:** Add `[config].auto_migrate` toggle, default `true`.
Global configs are never auto-migrated; only project-level.
- **P5-4:** Banner integration: the existing safety banner gets
a new optional line for "config notices" right under the
cwd/sensitivity summary.
- **P5-5:** Tests: zero-spam project file gets migrated; mixed
project file is left alone; recently-modified file is left
alone; auto_migrate=false disables.
---
## Cross-cutting: schemas and resolution
The pointer-field design (Phase 1) needs a clear resolution layer.
Proposal: every Config section gets a `Resolved...Section` mirror
that has plain (non-pointer) types. After Load, the resolver
populates one from the other, substituting defaults for nils.
Examples already exist in the codebase: `ResolvedSafetySection`
mirrors `SafetySection`. The pattern is established; we just need
to extend it.
Consumer-side: code reads from `cfg.Resolved.X` not `cfg.X`.
Loud renaming will catch any reader still using the raw layered
struct.
---
## Risks
- **Pointer-field migration is wide-scope.** Every reader of the
affected fields needs to change. Mitigated by the
resolver-mirror pattern (`ResolvedXSection`) — readers move from
one struct to another, but the call sites don't change shape.
- **Auto-migration writes silently.** Users might be surprised
even with the banner notice. Mitigated by `.bak` preservation
and the heuristic only firing on files that are obviously
zero-spam.
- **Registry becomes the same class of bug.** Documented in the
TODO entry already; Phase 2 explicitly requires atomic-write
and `omitempty` discipline. If we get this wrong the fix is the
same shape as Phase 1.
- **Privacy.** The registry is a list of directories the user has
worked in. Local-only, opt-out toggle, README note required.
- **Backwards compatibility for tests.** Tests that construct
`Config` by hand with explicit zeros may need updating.
Approach: add a `MustResolve` helper for test construction so
tests don't need to know about the pointer/resolver split.
---
## Rollout
Phases 1 + 2 ship together as a single release (encoder fix
needs the resolver, registry is independent but small). Tag as
`v0.4.0` — schema-touching changes warrant a minor bump per
the project's pre-1.0 semver discipline.
Phase 3 (`gnoma doctor`) can ship in a `v0.4.x` patch — it's
read-only and adds no surface compatibility risk.
Phase 4 (`gnoma upgrade-config`) ships in a follow-up `v0.4.x`.
Phase 5 (auto-migration) ships once Phase 4 has been in the wild
for at least one release cycle, so users have a way to opt in /
inspect before it becomes implicit.
---
## Open questions
- Should `gnoma doctor` also check that the `quality.json` file
is well-formed? Same dir, different concern — probably belongs
in doctor's scope as the umbrella "diagnose my gnoma install"
command.
- Registry size cap? After a year of usage on a busy machine
the file could grow to a few thousand entries. Reasonable; no
cap planned, but `Prune(staleAfter)` exposed for users who
want manual cleanup.
- Profiles: how do profile configs interact with the doctor /
upgrade flow? Default: treat each profile file as its own
upgradeable unit. Doctor lists findings per-profile.
@@ -0,0 +1,278 @@
# Sensitive Content — Unified Policy — 2026-05-24
Promotes the "sensitive-content handling — unified policy" TODO
entry into a phased design. Three input paths can introduce
sensitive content into the conversation context — pasted images,
pasted text, and tool-read files. Today each path has different
defences; this plan unifies them behind a single policy with a
single consent UI.
Sibling concerns:
[`2026-05-19-post-slm-unlock.md`](2026-05-19-post-slm-unlock.md)
Phase F (entropy detection) and the outgoing-scan firewall
already cover detection in some places; this plan unifies the
*decision* layer that sits in front of them.
---
## Problem
Three input paths to the engine carry distinct sensitivity
risks; each is handled differently today.
### Path 1 — Pasted images (Ctrl+V in the TUI)
Screenshot might contain API keys, terminal output with creds,
private repo contents, family photos, etc. Today:
- Image bytes land in the user cache dir.
- The router only sends to vision-capable arms.
- Local arms are fine; cloud arms send full image content to
the provider.
- Incognito skips paste entirely (per the no-persistence
contract).
What's missing: at-paste preview / warning. The user often does
not realise what the screenshot contained until after it's been
sent.
### Path 2 — Pasted text
User pastes a chunk into the input composer. Could be a log
snippet with credentials, an `.env` file content, an SSH key,
or just text. Today:
- Goes straight into the input buffer with no scanning.
- Outgoing firewall scans the final composed message before
send — *after* the user has already pressed Enter, often
redacting silently in the background.
- The user sees `[REDACTED]` in their own message after the
fact, no consent step.
What's missing: at-paste detection so the user sees the warning
*before* committing to send.
### Path 3 — Tool-read files
`fs_read`, `bash`, etc. surface file contents to the model. Today:
- Outgoing firewall scans tool *results* before they reach the
next provider turn (`ScanToolResult`).
- Format-aware entropy detection (Phase F-1) reduces false
positives on UUIDs / SHA / ISO timestamps.
- The audit log (just shipped) records what got blocked /
redacted per session.
What's missing: nothing structurally on this path; it's the
most-mature of the three. Listed here only for completeness so
the unified policy can be honest about asymmetric coverage.
### The unification question
These three paths converge into "content that joins the context
window." A consistent policy needs to answer, for each path:
1. **When** does detection run? (at paste / at send / at receive)
2. **What** does the user see? (warning / preview / redacted
placeholder / silent)
3. **What** is their consent gate? (approve / deny / approve-with-
redaction / skip)
4. **Where** is the action recorded? (audit log, banner, slog)
Today the answers vary per path. This plan picks one set of
answers and applies them everywhere.
---
## Non-goals
- **New detectors.** This plan reuses the existing scanner
(regex + entropy + unicode-sanitize). Phase F-2's SLM-assisted
detector lands separately when telemetry warrants.
- **Egress allowlist.** Tracked in the security-boundary TODO
entry, separate plan.
- **Provider-side redaction.** That's the provider's problem.
This plan is about what leaves gnoma's process.
---
## Approach
Single policy module: `internal/security/sensitive_policy.go`.
Exposes one decision function:
```go
type Decision int
const (
DecisionAllow Decision = iota
DecisionWarn // show warning, allow on confirm
DecisionRedactAndAllow
DecisionBlock
)
type Inspection struct {
Path string // "paste_text", "paste_image", "tool_result"
Content string // for text paths
ImageBytes []byte // for image paths; nil otherwise
Matches []scanner.Match // pre-scanned hits
}
func Decide(insp Inspection, mode IncognitoMode, prefs Preferences) Decision
```
All three paths route through `Decide` with their own
`Inspection`. UI surface — the at-paste prompt, the at-send
warning, the redacted-placeholder view — sits in the TUI and is
driven by the Decision value.
### Path-specific wiring
| Path | When | UI | Default Decision rules |
|---|---|---|---|
| paste_text | Ctrl+V into composer | Inline warning under input box, with `Tab` to expand match details | Match in scanner → `Warn` (text stays, user dismisses); explicit block-tier match → `Block` (paste dropped) |
| paste_image | Ctrl+V image | Pre-paste OCR scan (small local model) + warning before insertion | OCR finds secret pattern → `Warn`; user can choose `Redact` (image kept, warning attached) or `Cancel`. Incognito → `Block` (already today). |
| tool_result | After tool runs | Banner: `firewall: redacted N items in this tool result` | Existing behaviour. `Decide` invoked just to keep the API surface consistent; matches go to audit log. |
### Preferences
New `[security.sensitive]` config section:
```toml
[security.sensitive]
warn_on_paste_text = true # default true
warn_on_paste_image = true # default true
ocr_image_paste = false # opt-in: requires local vision arm
auto_redact = false # default false: ask first, redact second
silent_tool_results = false # default false: show banner when redactions happen
```
### Incognito interaction
When incognito is active, **every** Decision is treated as either
`Block` or `RedactAndAllow` — never `Warn`-then-`Allow`. Incognito
implies "I don't trust this conversation to persist"; the
sensible default is to be strict about what flows in.
---
## Phases
### Phase A — Policy module + config
- **A-1:** Add `[security.sensitive]` section to config.go with
the four flags above.
- **A-2:** Add `internal/security/sensitive_policy.go` with
`Inspection`, `Decision`, `Decide`.
- **A-3:** Unit tests for the decision matrix.
### Phase B — Path 2 (pasted text)
Highest user-visible payoff for the smallest surface.
- **B-1:** TUI input composer intercepts paste, runs
`Decide(paste_text, ...)` before the bytes enter the buffer.
- **B-2:** Decision = Warn → status-line warning, paste still
goes in. `Tab` expands details.
- **B-3:** Decision = Block → paste discarded, status line
explains why; user can override with `Ctrl+Shift+V`
(force-paste) which bypasses but writes to audit log.
- **B-4:** Tests: paste-of-known-secret triggers warning;
redacted variant shows what would have been sent.
### Phase C — Path 3 (tool-results) banner
- **C-1:** When `ScanToolResult` redacts ≥1 item, the engine
emits a system message: `firewall: redacted 2 items in
read-file output (see audit log)`.
- **C-2:** Gated behind `silent_tool_results = false` default.
Users who already trust the firewall can flip it on.
- **C-3:** Tests: integration test asserting the system
message appears.
### Phase D — Path 1 (pasted images)
Most complex. Image OCR requires a local vision model; without
one the paste falls back to today's behaviour.
- **D-1:** Add OCR hook: when `ocr_image_paste = true` and a
vision-capable local arm is available, run a small OCR pass
over the image before insertion.
- **D-2:** Feed OCR output through the regex/entropy scanner.
Matches → `Decide(paste_image, ...)` with the original image
attached.
- **D-3:** TUI shows a preview thumbnail + warning before
insertion confirmation.
- **D-4:** Without a vision arm: feature degrades gracefully
(no OCR, paste proceeds as today, banner notes "image paste
scan unavailable — no local vision arm").
### Phase E — Audit log integration
All four Decision outcomes get an audit entry. The audit log
already has the file format from the security-boundary work;
just need to define new Action values:
- `paste_warn`, `paste_block`, `paste_force_override`
- `image_paste_warn`, `image_paste_block`, `image_paste_ocr_skip`
- `tool_result_banner` (when redactions surfaced to user)
---
## Risks
- **OCR adds latency to paste.** Bad UX if image OCR takes >300ms.
Mitigation: hard-cap OCR time at 500ms, skip if exceeded, fall
back to no-scan path with banner notice. Local vision models on
consumer hardware should comfortably make this budget.
- **False positives on text paste become annoying.** If
`warn_on_paste_text = true` fires on every code snippet, users
turn it off and the protection is gone. Use the same
entropy_safelist Phase F-1 ships (uuid/sha/iso8601/url) — those
are the high-FP categories.
- **OCR introduces a new attack surface.** A malicious image could
exploit the OCR model. Mitigation: only local-arm OCR (the
attacker's input never leaves the machine); never call cloud
vision models for OCR (would defeat the privacy purpose).
- **Phase D depends on having a local vision model.** Users without
one get degraded UX. Document this clearly; consider whether to
ship a small bundled OCR-tuned model (probably no — adds 100MB+
to install).
---
## Open questions
- Should there be a "trusted projects" list where the warnings
are suppressed? Could live in the project registry (sibling
plan). Useful for monorepos where the user explicitly trusts
the local code.
- The `Ctrl+Shift+V` force-paste override is a footgun. Do we
want a confirm-second-time dialog, or just the keybind?
- Should clipboard contents be cleared from the host clipboard
after a sensitive paste? Cross-platform-tricky; defer.
- Sensitive-pattern feedback loop: when a user dismisses a warning
as "this isn't a secret", do we learn from that? Privacy concern
— would need an explicit opt-in.
---
## Rollout
Phases A + B + C land together as one feature release. Phase D
(image OCR) is opt-in (`ocr_image_paste = true`) and can land in
a follow-up patch — its surface is large and benefits from real-
world UX feedback. Phase E threads through all four; it lands
incrementally per phase, not as a single batch.
Realistic target: Phase A/B/C in v0.5.0; Phase D in v0.5.x. All
behaviour is gated behind the four config flags so existing users
who don't opt in see no behavioural change.
---
## Cross-references
- TODO.md entry "Sensitive-content handling — unified policy"
- [`2026-05-19-post-slm-unlock.md`](2026-05-19-post-slm-unlock.md) — Phase F entropy detection
- [`2026-05-19-security-wave2-incognito.md`](2026-05-19-security-wave2-incognito.md) — incognito-mode contract
- TODO.md entry "Security boundary — egress controls + session audit log" — the audit log this plan piggybacks on
+4 -4
View File
@@ -15,7 +15,7 @@ require (
github.com/charmbracelet/x/ansi v0.11.6
github.com/openai/openai-go v1.12.0
github.com/pkoukk/tiktoken-go v0.1.8
golang.org/x/text v0.35.0
golang.org/x/text v0.37.0
google.golang.org/genai v1.52.1
gopkg.in/yaml.v3 v3.0.1
mvdan.cc/sh/v3 v3.13.0
@@ -63,10 +63,10 @@ require (
go.opentelemetry.io/otel v1.42.0 // indirect
go.opentelemetry.io/otel/metric v1.42.0 // indirect
go.opentelemetry.io/otel/trace v1.42.0 // indirect
golang.org/x/crypto v0.49.0 // indirect
golang.org/x/net v0.52.0 // indirect
golang.org/x/crypto v0.51.0 // indirect
golang.org/x/net v0.55.0 // indirect
golang.org/x/sync v0.20.0 // indirect
golang.org/x/sys v0.42.0 // indirect
golang.org/x/sys v0.45.0 // indirect
google.golang.org/api v0.267.0 // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20260217215200-42d3e9bedb6d // indirect
google.golang.org/grpc v1.79.3 // indirect
+8 -8
View File
@@ -142,18 +142,18 @@ go.opentelemetry.io/otel/sdk/metric v1.39.0 h1:cXMVVFVgsIf2YL6QkRF4Urbr/aMInf+2W
go.opentelemetry.io/otel/sdk/metric v1.39.0/go.mod h1:xq9HEVH7qeX69/JnwEfp6fVq5wosJsY1mt4lLfYdVew=
go.opentelemetry.io/otel/trace v1.42.0 h1:OUCgIPt+mzOnaUTpOQcBiM/PLQ/Op7oq6g4LenLmOYY=
go.opentelemetry.io/otel/trace v1.42.0/go.mod h1:f3K9S+IFqnumBkKhRJMeaZeNk9epyhnCmQh/EysQCdc=
golang.org/x/crypto v0.49.0 h1:+Ng2ULVvLHnJ/ZFEq4KdcDd/cfjrrjjNSXNzxg0Y4U4=
golang.org/x/crypto v0.49.0/go.mod h1:ErX4dUh2UM+CFYiXZRTcMpEcN8b/1gxEuv3nODoYtCA=
golang.org/x/crypto v0.51.0 h1:IBPXwPfKxY7cWQZ38ZCIRPI50YLeevDLlLnyC5wRGTI=
golang.org/x/crypto v0.51.0/go.mod h1:8AdwkbraGNABw2kOX6YFPs3WM22XqI4EXEd8g+x7Oc8=
golang.org/x/exp v0.0.0-20231006140011-7918f672742d h1:jtJma62tbqLibJ5sFQz8bKtEM8rJBtfilJ2qTU199MI=
golang.org/x/exp v0.0.0-20231006140011-7918f672742d/go.mod h1:ldy0pHrwJyGW56pPQzzkH36rKxoZW1tw7ZJpeKx+hdo=
golang.org/x/net v0.52.0 h1:He/TN1l0e4mmR3QqHMT2Xab3Aj3L9qjbhRm78/6jrW0=
golang.org/x/net v0.52.0/go.mod h1:R1MAz7uMZxVMualyPXb+VaqGSa3LIaUqk0eEt3w36Sw=
golang.org/x/net v0.55.0 h1:bcvxaJn3e1U6InsFWt1JUq1aSjnRxLzT2rtD2KfkDF8=
golang.org/x/net v0.55.0/go.mod h1:L5U2KuzuOe1lY7Z+aWVIKK6qEeJXnXV9yzGA+WCHJww=
golang.org/x/sync v0.20.0 h1:e0PTpb7pjO8GAtTs2dQ6jYa5BWYlMuX047Dco/pItO4=
golang.org/x/sync v0.20.0/go.mod h1:9xrNwdLfx4jkKbNva9FpL6vEN7evnE43NNNJQ2LF3+0=
golang.org/x/sys v0.42.0 h1:omrd2nAlyT5ESRdCLYdm3+fMfNFE/+Rf4bDIQImRJeo=
golang.org/x/sys v0.42.0/go.mod h1:4GL1E5IUh+htKOUEOaiffhrAeqysfVGipDYzABqnCmw=
golang.org/x/text v0.35.0 h1:JOVx6vVDFokkpaq1AEptVzLTpDe9KGpj5tR4/X+ybL8=
golang.org/x/text v0.35.0/go.mod h1:khi/HExzZJ2pGnjenulevKNX1W67CUy0AsXcNubPGCA=
golang.org/x/sys v0.45.0 h1:dO4czNzziLiiXplLQgBCEpCvXQ3dnkn0SdaZSYdQ+FY=
golang.org/x/sys v0.45.0/go.mod h1:4GL1E5IUh+htKOUEOaiffhrAeqysfVGipDYzABqnCmw=
golang.org/x/text v0.37.0 h1:Cqjiwd9eSg8e0QAkyCaQTNHFIIzWtidPahFWR83rTrc=
golang.org/x/text v0.37.0/go.mod h1:a5sjxXGs9hsn/AJVwuElvCAo9v8QYLzvavO5z2PiM38=
gonum.org/v1/gonum v0.16.0 h1:5+ul4Swaf3ESvrOnidPp4GZbzf0mxVQpDCYUQE7OJfk=
gonum.org/v1/gonum v0.16.0/go.mod h1:fef3am4MQ93R2HHpKnLk4/Tbh/s0+wqD5nfa6Pnwy4E=
google.golang.org/api v0.267.0 h1:w+vfWPMPYeRs8qH1aYYsFX68jMls5acWl/jocfLomwE=
+34
View File
@@ -157,6 +157,40 @@ type RouterSection struct {
// and incognito take priority over this knob. See
// docs/superpowers/plans/2026-05-23-prefer-routing-policy.md.
Prefer string `toml:"prefer"`
// Bandit exposes the selector's tuning knobs. Defaults preserve
// previous hard-coded behaviour exactly; only set these when you
// need to tune the EMA quality tracker for an unusual workload.
Bandit BanditSection `toml:"bandit"`
}
// BanditSection holds the scoring knobs for the EMA quality tracker
// and the score blend used by the selector. Each field has a sentinel
// zero value that means "use the built-in default" so an empty TOML
// block is byte-identical to pre-config behaviour. See
// internal/router/feedback.go and internal/router/selector.go for the
// formulas these knobs feed into.
type BanditSection struct {
// QualityAlpha is the EMA smoothing factor for arm-quality
// observations. Larger values weight recent observations more.
// Default: 0.3 (~3-sample memory). 0.0 here means "use default".
QualityAlpha float64 `toml:"quality_alpha"`
// MinObservations is the minimum number of samples required
// before observed EMA overrides the heuristic fallback. Default:
// 3. 0 here means "use default".
MinObservations int `toml:"min_observations"`
// ObservedWeight is the weight of the observed EMA in the
// observed/heuristic blend inside scoreArm: the final quality is
// `observed*W + heuristic*(1-W)`. Default: 0.7. 0.0 here means
// "use default".
ObservedWeight float64 `toml:"observed_weight"`
// StrengthBonus is the quality bonus added when an arm declares
// the current task type in its Strengths list. Default: 0.15.
// 0.0 here means "use default".
StrengthBonus float64 `toml:"strength_bonus"`
}
// MCPServerConfig defines an MCP server to start and connect to.
+1 -1
View File
@@ -38,7 +38,7 @@ func TestTryLoadOAuthCredentials_Formats(t *testing.T) {
name: "camelCase and milliseconds expiry",
data: oauthCreds{
AccessToken2: "token-camel",
ExpiresAt: time.Now().Add(1 * time.Hour).UnixNano() / 1e6,
ExpiresAt: time.Now().Add(1*time.Hour).UnixNano() / 1e6,
TokenType2: "Bearer",
},
expectError: false,
+12 -1
View File
@@ -109,8 +109,19 @@ var knownAgents = []CLIAgent{
// structured-output flag and no image-input mechanism. JSON support
// is faked via PromptResponseFormat (best-effort, model-dependent);
// see TODO.md for tracking native stream-json support.
//
// ToolUse is false on purpose. agy streams plain text and the
// agyParser turns every line into an EventTextDelta — there is
// no path for a structured ToolCall event to come back. With
// ToolUse=true the router would dispatch tool-needing tasks
// (security_review, spawn_elfs, file edit) to agy; the
// underlying Gemini model would describe calling the tool in
// prose (invented UUIDs and "I will pause now"-style stubs),
// the engine would receive only text, and the turn would hang
// waiting for a tool call that never arrives. Flip back to
// true when native stream-json lands.
Capabilities: provider.Capabilities{
ToolUse: true,
ToolUse: false,
ContextWindow: 200000,
},
PromptResponseFormat: true,
+4 -4
View File
@@ -57,12 +57,12 @@ func benchTasks() []Task {
func BenchmarkSelectBest(b *testing.B) {
arms := benchArms()
tasks := benchTasks()
qt := NewQualityTracker()
qt := NewQualityTracker(0, 0)
b.ResetTimer()
for b.Loop() {
for _, task := range tasks {
selectBest(qt, arms, task, PreferAuto)
selectBest(qt, BanditParams{}, arms, task, PreferAuto)
}
}
}
@@ -99,13 +99,13 @@ func BenchmarkRouterSelect(b *testing.B) {
func BenchmarkScoreArm(b *testing.B) {
arms := benchArms()
qt := NewQualityTracker()
qt := NewQualityTracker(0, 0)
task := Task{Type: TaskGeneration, Priority: PriorityNormal, EstimatedTokens: 2000, RequiresTools: true, ComplexityScore: 0.5}
b.ResetTimer()
for b.Loop() {
for _, arm := range arms {
scoreArm(qt, arm, task)
scoreArm(qt, BanditParams{}, arm, task)
}
}
}
+4 -5
View File
@@ -338,10 +338,10 @@ func TestRoutingDefaults_PayoffScenario(t *testing.T) {
}
cases := []struct {
name string
task Task
wantArmID ArmID
reason string
name string
task Task
wantArmID ArmID
reason string
}{
{
name: "Generation picks qwen3-coder",
@@ -472,4 +472,3 @@ func TestRoutingDefaults_LocalFleetVisibility(t *testing.T) {
}
}
}
+26 -6
View File
@@ -2,9 +2,15 @@ package router
import "sync"
// Built-in defaults for the bandit knobs. Surfaced via
// [router.bandit] config keys; see BanditParams in router.go. Kept
// here so the QualityTracker has a sensible fallback when constructed
// without explicit parameters (tests, ad-hoc callers).
const (
qualityAlpha = 0.3 // EMA smoothing factor (~3-sample memory)
minObservations = 3 // min samples before observed score overrides heuristic
defaultQualityAlpha = 0.3 // EMA smoothing factor (~3-sample memory)
defaultMinObservations = 3 // min samples before observed score overrides heuristic
defaultObservedWeight = 0.7 // weight of observed score in observed/heuristic blend
defaultStrengthBonus = 0.15
)
// EMAScore tracks an exponential moving average quality score.
@@ -19,13 +25,27 @@ type QualityTracker struct {
mu sync.RWMutex
scores map[ArmID]map[TaskType]*EMAScore
classifierCount map[ClassifierSource]int
// Configurable knobs — set via NewQualityTracker. Pass 0 for any
// argument to keep the built-in default.
alpha float64
minObservations int
}
// NewQualityTracker returns an empty QualityTracker.
func NewQualityTracker() *QualityTracker {
// NewQualityTracker returns an empty QualityTracker. Pass 0 for any
// argument to keep the built-in default (alpha=0.3, minObs=3).
func NewQualityTracker(alpha float64, minObs int) *QualityTracker {
if alpha == 0 {
alpha = defaultQualityAlpha
}
if minObs == 0 {
minObs = defaultMinObservations
}
return &QualityTracker{
scores: make(map[ArmID]map[TaskType]*EMAScore),
classifierCount: make(map[ClassifierSource]int),
alpha: alpha,
minObservations: minObs,
}
}
@@ -71,7 +91,7 @@ func (qt *QualityTracker) Record(armID ArmID, taskType TaskType, success bool) {
if s.Count == 0 {
s.Value = observation
} else {
s.Value = qualityAlpha*observation + (1-qualityAlpha)*s.Value
s.Value = qt.alpha*observation + (1-qt.alpha)*s.Value
}
s.Count++
}
@@ -86,7 +106,7 @@ func (qt *QualityTracker) Quality(armID ArmID, taskType TaskType) (score float64
return 0, false
}
s, ok := m[taskType]
if !ok || s.Count < minObservations {
if !ok || s.Count < qt.minObservations {
return 0, false
}
return s.Value, true
+46 -4
View File
@@ -8,7 +8,7 @@ import (
)
func TestQualityTracker_NoDataReturnsHeuristic(t *testing.T) {
qt := router.NewQualityTracker()
qt := router.NewQualityTracker(0, 0)
_, hasData := qt.Quality("arm:model", router.TaskGeneration)
if hasData {
t.Error("expected no data for unobserved arm")
@@ -16,7 +16,7 @@ func TestQualityTracker_NoDataReturnsHeuristic(t *testing.T) {
}
func TestQualityTracker_RecordUpdatesEMA(t *testing.T) {
qt := router.NewQualityTracker()
qt := router.NewQualityTracker(0, 0)
for i := 0; i < 3; i++ {
qt.Record("arm:model", router.TaskGeneration, true)
}
@@ -30,7 +30,7 @@ func TestQualityTracker_RecordUpdatesEMA(t *testing.T) {
}
func TestQualityTracker_AllFailuresLowScore(t *testing.T) {
qt := router.NewQualityTracker()
qt := router.NewQualityTracker(0, 0)
for i := 0; i < 5; i++ {
qt.Record("arm:model", router.TaskDebug, false)
}
@@ -41,7 +41,7 @@ func TestQualityTracker_AllFailuresLowScore(t *testing.T) {
}
func TestQualityTracker_ConcurrentSafe(t *testing.T) {
qt := router.NewQualityTracker()
qt := router.NewQualityTracker(0, 0)
done := make(chan struct{})
for i := 0; i < 10; i++ {
go func(success bool) {
@@ -113,3 +113,45 @@ func TestQualityTracker_InsufficientDataFallsBackToHeuristic(t *testing.T) {
}
decision.Rollback()
}
func TestQualityTracker_CustomAlphaShortensMemory(t *testing.T) {
// alpha=0.9 weights the latest sample heavily; after a single
// failure the score should drop further than with the default 0.3.
fast := router.NewQualityTracker(0.9, 0)
slow := router.NewQualityTracker(0.0, 0) // 0 → default 0.3
for _, qt := range []*router.QualityTracker{fast, slow} {
// Build up history at the high end with 5 successes.
for i := 0; i < 5; i++ {
qt.Record("arm:m", router.TaskGeneration, true)
}
// One failure.
qt.Record("arm:m", router.TaskGeneration, false)
}
fastScore, _ := fast.Quality("arm:m", router.TaskGeneration)
slowScore, _ := slow.Quality("arm:m", router.TaskGeneration)
if !(fastScore < slowScore) {
t.Errorf("expected fast alpha (0.9) to drop quality faster than default (0.3): fast=%f slow=%f", fastScore, slowScore)
}
}
func TestQualityTracker_CustomMinObservationsGatesScore(t *testing.T) {
// minObs=10 means Quality should return hasData=false until 10
// observations are recorded, even though the default would say
// "yes" after 3.
qt := router.NewQualityTracker(0, 10)
for i := 0; i < 5; i++ {
qt.Record("arm:m", router.TaskGeneration, true)
}
if _, hasData := qt.Quality("arm:m", router.TaskGeneration); hasData {
t.Error("expected hasData=false at 5 observations with minObs=10")
}
for i := 0; i < 5; i++ {
qt.Record("arm:m", router.TaskGeneration, true)
}
if _, hasData := qt.Quality("arm:m", router.TaskGeneration); !hasData {
t.Error("expected hasData=true after 10 observations with minObs=10")
}
}
+4 -4
View File
@@ -54,10 +54,10 @@ func TestPolicyMultiplier(t *testing.T) {
cloudArm := &Arm{IsLocal: false}
cases := []struct {
name string
arm *Arm
policy PreferPolicy
want float64
name string
arm *Arm
policy PreferPolicy
want float64
}{
{"auto/local", localArm, PreferAuto, 1.0},
{"auto/cloud", cloudArm, PreferAuto, 1.0},
+7 -7
View File
@@ -8,7 +8,7 @@ import (
)
func TestQualityTracker_SnapshotRestore_RoundTrip(t *testing.T) {
qt := router.NewQualityTracker()
qt := router.NewQualityTracker(0, 0)
// Record some outcomes
qt.Record("anthropic/claude-3-5-sonnet", router.TaskGeneration, true)
qt.Record("anthropic/claude-3-5-sonnet", router.TaskGeneration, true)
@@ -33,7 +33,7 @@ func TestQualityTracker_SnapshotRestore_RoundTrip(t *testing.T) {
}
// Restore into a fresh tracker
qt2 := router.NewQualityTracker()
qt2 := router.NewQualityTracker(0, 0)
qt2.Restore(restored)
// After restore, Quality() should return data (Count >= minObservations=3)
@@ -47,7 +47,7 @@ func TestQualityTracker_SnapshotRestore_RoundTrip(t *testing.T) {
}
func TestQualityTracker_Snapshot_Empty(t *testing.T) {
qt := router.NewQualityTracker()
qt := router.NewQualityTracker(0, 0)
snap := qt.Snapshot()
if snap.Scores == nil {
t.Error("scores map should be initialized (not nil)")
@@ -58,7 +58,7 @@ func TestQualityTracker_Snapshot_Empty(t *testing.T) {
}
func TestQualityTracker_ClassifierCounts_RecordAndSnapshot(t *testing.T) {
qt := router.NewQualityTracker()
qt := router.NewQualityTracker(0, 0)
qt.RecordClassifier(router.ClassifierHeuristic)
qt.RecordClassifier(router.ClassifierSLM)
qt.RecordClassifier(router.ClassifierSLM)
@@ -92,7 +92,7 @@ func TestQualityTracker_ClassifierCounts_RecordAndSnapshot(t *testing.T) {
if err := json.Unmarshal(data, &restored); err != nil {
t.Fatal(err)
}
qt2 := router.NewQualityTracker()
qt2 := router.NewQualityTracker(0, 0)
qt2.Restore(restored)
if qt2.ClassifierCounts()[router.ClassifierSLM] != 2 {
t.Errorf("restored slm count = %d, want 2", qt2.ClassifierCounts()[router.ClassifierSLM])
@@ -107,7 +107,7 @@ func TestQualityTracker_Restore_BackCompat_NoClassifierCounts(t *testing.T) {
if err := json.Unmarshal(legacy, &snap); err != nil {
t.Fatal(err)
}
qt := router.NewQualityTracker()
qt := router.NewQualityTracker(0, 0)
qt.Restore(snap)
if qt.ClassifierCounts() == nil {
t.Error("ClassifierCounts() must return a non-nil map after restoring old snapshot")
@@ -122,7 +122,7 @@ func TestQualityTracker_Restore_BackCompat_NoClassifierCounts(t *testing.T) {
}
func TestQualityTracker_Restore_Replaces(t *testing.T) {
qt := router.NewQualityTracker()
qt := router.NewQualityTracker(0, 0)
qt.Record("arm-a", router.TaskDebug, true)
qt.Record("arm-a", router.TaskDebug, true)
qt.Record("arm-a", router.TaskDebug, true)
+40 -2
View File
@@ -27,6 +27,7 @@ type Router struct {
preferPolicy PreferPolicy
quality *QualityTracker
bandit BanditParams
}
// PreferPolicy biases the scoring step toward local or cloud arms.
@@ -77,6 +78,41 @@ func (p PreferPolicy) String() string {
type Config struct {
Logger *slog.Logger
// Bandit tunes the selector's scoring knobs. Pass a zero value to
// keep all pre-config behaviour byte-identical; set individual
// fields to override the corresponding default.
Bandit BanditParams
}
// BanditParams controls the EMA quality tracker and score blend used
// by the selector. Each field has a "use default" sentinel (0 for
// floats and ints) so a zero-valued BanditParams is byte-identical to
// the pre-config hardcoded constants. Defaults are defined in
// resolveBanditParams below.
type BanditParams struct {
QualityAlpha float64
MinObservations int
ObservedWeight float64
StrengthBonus float64
}
// resolveBanditParams fills in the built-in defaults for any field
// left at its zero value. Centralised so the same defaults apply
// across NewQualityTracker, scoreArm, and any future caller.
func resolveBanditParams(p BanditParams) BanditParams {
if p.QualityAlpha == 0 {
p.QualityAlpha = defaultQualityAlpha
}
if p.MinObservations == 0 {
p.MinObservations = defaultMinObservations
}
if p.ObservedWeight == 0 {
p.ObservedWeight = defaultObservedWeight
}
if p.StrengthBonus == 0 {
p.StrengthBonus = defaultStrengthBonus
}
return p
}
func New(cfg Config) *Router {
@@ -84,10 +120,12 @@ func New(cfg Config) *Router {
if logger == nil {
logger = slog.Default()
}
params := resolveBanditParams(cfg.Bandit)
return &Router{
arms: make(map[ArmID]*Arm),
logger: logger,
quality: NewQualityTracker(),
quality: NewQualityTracker(params.QualityAlpha, params.MinObservations),
bandit: params,
}
}
@@ -172,7 +210,7 @@ func (r *Router) Select(task Task) RoutingDecision {
}
// Select best
best := selectBest(r.quality, feasible, task, r.preferPolicy)
best := selectBest(r.quality, r.bandit, feasible, task, r.preferPolicy)
if best == nil {
return RoutingDecision{Error: fmt.Errorf("selection failed")}
}
+7 -7
View File
@@ -262,7 +262,7 @@ func TestSelectBest_PrefersToolSupport(t *testing.T) {
}
task := Task{Type: TaskGeneration, RequiresTools: true, Priority: PriorityNormal}
best := selectBest(nil, []*Arm{withoutTools, withTools}, task, PreferAuto)
best := selectBest(nil, BanditParams{}, []*Arm{withoutTools, withTools}, task, PreferAuto)
if best.ID != "a/with-tools" {
t.Errorf("should prefer arm with tool support, got %s", best.ID)
@@ -282,7 +282,7 @@ func TestSelectBest_PrefersThinkingForPlanning(t *testing.T) {
}
task := Task{Type: TaskPlanning, RequiresTools: true, Priority: PriorityNormal, EstimatedTokens: 5000}
best := selectBest(nil, []*Arm{noThinking, thinking}, task, PreferAuto)
best := selectBest(nil, BanditParams{}, []*Arm{noThinking, thinking}, task, PreferAuto)
if best.ID != "a/thinking" {
t.Errorf("should prefer thinking model for planning, got %s", best.ID)
@@ -625,7 +625,7 @@ func TestSelectBest_SmallArmWinsTrivialTask(t *testing.T) {
Capabilities: provider.Capabilities{ToolUse: false},
}
task := Task{Type: TaskExplain, ComplexityScore: 0.05, RequiresTools: false}
got := selectBest(nil, []*Arm{cliArm, smallArm}, task, PreferAuto)
got := selectBest(nil, BanditParams{}, []*Arm{cliArm, smallArm}, task, PreferAuto)
if got != smallArm {
t.Errorf("selectBest = %v, want smallArm", got)
}
@@ -647,7 +647,7 @@ func TestSelectBest_CLIAgentWinsComplexTask(t *testing.T) {
Capabilities: provider.Capabilities{ToolUse: false},
}
task := Task{Type: TaskRefactor, ComplexityScore: 0.7, RequiresTools: true}
got := selectBest(nil, []*Arm{cliArm, smallArm}, task, PreferAuto)
got := selectBest(nil, BanditParams{}, []*Arm{cliArm, smallArm}, task, PreferAuto)
if got != cliArm {
t.Errorf("selectBest = %v, want cliArm", got)
}
@@ -672,21 +672,21 @@ func TestSelectBest_TierPreference(t *testing.T) {
task := Task{Type: TaskGeneration, Priority: PriorityNormal, EstimatedTokens: 1000}
t.Run("CLI beats local and API", func(t *testing.T) {
best := selectBest(nil, []*Arm{apiArm, localArm, cliArm}, task, PreferAuto)
best := selectBest(nil, BanditParams{}, []*Arm{apiArm, localArm, cliArm}, task, PreferAuto)
if best.ID != "subprocess/claude" {
t.Errorf("want subprocess/claude (tier 0), got %s", best.ID)
}
})
t.Run("local beats API when no CLI", func(t *testing.T) {
best := selectBest(nil, []*Arm{apiArm, localArm}, task, PreferAuto)
best := selectBest(nil, BanditParams{}, []*Arm{apiArm, localArm}, task, PreferAuto)
if best.ID != "ollama/llama3" {
t.Errorf("want ollama/llama3 (tier 1), got %s", best.ID)
}
})
t.Run("API selected when only option", func(t *testing.T) {
best := selectBest(nil, []*Arm{apiArm}, task, PreferAuto)
best := selectBest(nil, BanditParams{}, []*Arm{apiArm}, task, PreferAuto)
if best == nil || best.ID != "mistral/mistral-large" {
t.Errorf("want mistral/mistral-large (tier 2), got %v", best)
}
+13 -13
View File
@@ -98,7 +98,7 @@ func armBaseTier(arm *Arm, task Task) int {
//
// Step 2 (fallback): walk tiers low→high. Within a tier, highest-scoring
// arm wins.
func selectBest(qt *QualityTracker, arms []*Arm, task Task, prefer PreferPolicy) *Arm {
func selectBest(qt *QualityTracker, params BanditParams, arms []*Arm, task Task, prefer PreferPolicy) *Arm {
if len(arms) == 0 {
return nil
}
@@ -110,7 +110,7 @@ func selectBest(qt *QualityTracker, arms []*Arm, task Task, prefer PreferPolicy)
}
}
if len(promoted) > 0 {
return bestScored(qt, promoted, task, prefer)
return bestScored(qt, params, promoted, task, prefer)
}
// Walk tiers low→high. armTier returns up to 5 when prefer is set
@@ -124,18 +124,18 @@ func selectBest(qt *QualityTracker, arms []*Arm, task Task, prefer PreferPolicy)
}
}
if len(inTier) > 0 {
return bestScored(qt, inTier, task, prefer)
return bestScored(qt, params, inTier, task, prefer)
}
}
return nil
}
// bestScored returns the highest-scoring arm within a set.
func bestScored(qt *QualityTracker, arms []*Arm, task Task, prefer PreferPolicy) *Arm {
func bestScored(qt *QualityTracker, params BanditParams, arms []*Arm, task Task, prefer PreferPolicy) *Arm {
var best *Arm
bestScore := math.Inf(-1)
for _, arm := range arms {
score := scoreArm(qt, arm, task) * policyMultiplier(arm, prefer)
score := scoreArm(qt, params, arm, task) * policyMultiplier(arm, prefer)
if score > bestScore {
bestScore = score
best = arm
@@ -172,13 +172,12 @@ func policyMultiplier(arm *Arm, p PreferPolicy) float64 {
}
}
// strengthScoreBonus is added to quality when an arm's Strengths list
// matches the incoming task type. Tunable in one place.
const strengthScoreBonus = 0.15
// scoreArm computes a quality/cost score for an arm.
// When the quality tracker has sufficient observations, blends observed EMA
// (70%) with heuristic (30%). Falls back to pure heuristic otherwise.
// (default 70%) with heuristic (default 30%). Falls back to pure heuristic
// otherwise. The blend ratio and strength bonus are tunable via
// BanditParams (config: [router.bandit]); a zero-valued params falls back
// to the built-in defaults.
//
// Strengths add a fixed bonus to quality when matching task.Type. CostWeight
// dampens the cost penalty linearly:
@@ -189,16 +188,17 @@ const strengthScoreBonus = 0.15
// the original effectiveCost == cost. With CostWeight=0 cost is fully
// ignored (effectiveCost = 1.0). Local arms with sub-1 raw costs are not
// amplified by fractional weights (the linear formula stays monotone).
func scoreArm(qt *QualityTracker, arm *Arm, task Task) float64 {
func scoreArm(qt *QualityTracker, params BanditParams, arm *Arm, task Task) float64 {
params = resolveBanditParams(params)
hq := heuristicQuality(arm, task)
quality := hq
if qt != nil {
if observed, hasData := qt.Quality(arm.ID, task.Type); hasData {
quality = 0.7*observed + 0.3*hq
quality = params.ObservedWeight*observed + (1-params.ObservedWeight)*hq
}
}
if arm.HasStrength(task.Type) {
quality += strengthScoreBonus
quality += params.StrengthBonus
}
value := task.ValueScore()
rawCost := effectiveCost(arm, task)
+12 -12
View File
@@ -65,17 +65,17 @@ func TestScoreArm_CostWeightAffectsArmComparison(t *testing.T) {
// CostWeight=1.0: cost dominates, cheap arm wins.
cheap.CostWeight, expensive.CostWeight = 1.0, 1.0
if scoreArm(nil, cheap, task) <= scoreArm(nil, expensive, task) {
if scoreArm(nil, BanditParams{}, cheap, task) <= scoreArm(nil, BanditParams{}, expensive, task) {
t.Errorf("CostWeight=1.0: cheap arm should beat expensive arm; cheap=%v expensive=%v",
scoreArm(nil, cheap, task), scoreArm(nil, expensive, task))
scoreArm(nil, BanditParams{}, cheap, task), scoreArm(nil, BanditParams{}, expensive, task))
}
// CostWeight=0.0: cost ignored, quality alone decides → expensive (better
// context window) wins.
cheap.CostWeight, expensive.CostWeight = 0.001, 0.001
if scoreArm(nil, expensive, task) <= scoreArm(nil, cheap, task) {
if scoreArm(nil, BanditParams{}, expensive, task) <= scoreArm(nil, BanditParams{}, cheap, task) {
t.Errorf("CostWeight~0: higher-quality expensive arm should beat cheap arm; expensive=%v cheap=%v",
scoreArm(nil, expensive, task), scoreArm(nil, cheap, task))
scoreArm(nil, BanditParams{}, expensive, task), scoreArm(nil, BanditParams{}, cheap, task))
}
}
@@ -140,8 +140,8 @@ func TestScoreArm_StrengthBonus(t *testing.T) {
}
task := Task{Type: TaskSecurityReview, EstimatedTokens: 5000, RequiresTools: true, Priority: PriorityNormal}
a := scoreArm(nil, withoutStrength, task)
b := scoreArm(nil, withStrength, task)
a := scoreArm(nil, BanditParams{}, withoutStrength, task)
b := scoreArm(nil, BanditParams{}, withStrength, task)
if !(b > a) {
t.Errorf("strength-tagged arm score (%v) should exceed plain arm score (%v)", b, a)
}
@@ -160,8 +160,8 @@ func TestScoreArm_StrengthBonusDoesNotApplyToOtherTasks(t *testing.T) {
}
task := Task{Type: TaskDebug, EstimatedTokens: 5000, RequiresTools: true, Priority: PriorityNormal}
a := scoreArm(nil, plain, task)
b := scoreArm(nil, tagged, task)
a := scoreArm(nil, BanditParams{}, plain, task)
b := scoreArm(nil, BanditParams{}, tagged, task)
if math.Abs(a-b) > 1e-9 {
t.Errorf("non-matching task should ignore Strengths: plain=%v tagged=%v", a, b)
}
@@ -184,7 +184,7 @@ func TestSelectBest_StrengthPromotedArmBeatsCLIAgent(t *testing.T) {
}
task := Task{Type: TaskSecurityReview, EstimatedTokens: 5000, RequiresTools: true, Priority: PriorityNormal}
got := selectBest(nil, []*Arm{cliAgent, opus}, task, PreferAuto)
got := selectBest(nil, BanditParams{}, []*Arm{cliAgent, opus}, task, PreferAuto)
if got == nil {
t.Fatal("selectBest returned nil")
}
@@ -208,7 +208,7 @@ func TestSelectBest_EmptyStrengthsPreservesTierOrder(t *testing.T) {
}
task := Task{Type: TaskSecurityReview, EstimatedTokens: 5000, RequiresTools: true, Priority: PriorityNormal}
got := selectBest(nil, []*Arm{cliAgent, opus}, task, PreferAuto)
got := selectBest(nil, BanditParams{}, []*Arm{cliAgent, opus}, task, PreferAuto)
if got.ID != cliAgent.ID {
t.Errorf("without Strengths, CLI-agent tier-1 should win; got %s", got.ID)
}
@@ -327,7 +327,7 @@ func TestSelectBest_MultiplePromotedArmsBestQualityWins(t *testing.T) {
Strengths: []TaskType{TaskSecurityReview},
}
qt := NewQualityTracker()
qt := NewQualityTracker(0, 0)
// armB has consistently succeeded — minObservations=3 is enough to flip
// the score blend.
for i := 0; i < 5; i++ {
@@ -339,7 +339,7 @@ func TestSelectBest_MultiplePromotedArmsBestQualityWins(t *testing.T) {
}
task := Task{Type: TaskSecurityReview, EstimatedTokens: 5000, RequiresTools: true, Priority: PriorityNormal}
got := selectBest(qt, []*Arm{armA, armB}, task, PreferAuto)
got := selectBest(qt, BanditParams{}, []*Arm{armA, armB}, task, PreferAuto)
if got == nil {
t.Fatal("selectBest returned nil")
}
+10 -10
View File
@@ -10,16 +10,16 @@ import (
// Caller passes whatever is known at launch time; empty fields are
// omitted from the rendered banner.
type SessionInfo struct {
Version string // e.g. "0.2.1"
GitBranch string // empty if not in a git repo
GitDirty bool // true if working tree has uncommitted changes
ProjectType string // free-form, e.g. "Go module (somegit.dev/...)"
Provider string // e.g. "ollama"
Model string // e.g. "qwen3-coder:30b"
Permission string // e.g. "auto", "accept_edits"
Incognito bool
Prefer string // "auto" / "local" / "cloud"
Tenant string // optional, e.g. Kubernetes context name
Version string // e.g. "0.2.1"
GitBranch string // empty if not in a git repo
GitDirty bool // true if working tree has uncommitted changes
ProjectType string // free-form, e.g. "Go module (somegit.dev/...)"
Provider string // e.g. "ollama"
Model string // e.g. "qwen3-coder:30b"
Permission string // e.g. "auto", "accept_edits"
Incognito bool
Prefer string // "auto" / "local" / "cloud"
Tenant string // optional, e.g. Kubernetes context name
}
// RenderContextBanner returns the always-shown banner with cwd, git,
+1 -1
View File
@@ -21,7 +21,7 @@ func TestScanCWDForSensitive_Matches(t *testing.T) {
}
// Non-sensitive control files.
control := []string{
".envrc", // direnv config, not a credential
".envrc", // direnv config, not a credential
"main.go",
"README.md",
"secret_handler.go", // source code, not data
+121
View File
@@ -0,0 +1,121 @@
package security
import (
"encoding/json"
"log/slog"
"os"
"path/filepath"
"sync"
"time"
)
// AuditEvent records a single firewall action (block / redact / sanitize)
// in a structured form intended for per-session post-mortem grepping.
//
// Discipline: this struct must never carry the raw bytes of any matched
// secret. The Pattern field names the matcher (e.g. "anthropic_api_key",
// "high_entropy"); TokenLen carries the length of the offending token so
// the user can recognise it in a transcript without re-leaking it.
type AuditEvent struct {
// Timestamp is the wall-clock time of the event in UTC.
Timestamp time.Time `json:"ts"`
// Action is one of: "block", "redact", "warn", "unicode_sanitize".
Action string `json:"action"`
// Pattern is the human-readable matcher name (regex tag or
// "high_entropy" / "unicode"). Never the matched bytes themselves.
Pattern string `json:"pattern,omitempty"`
// Source describes where in the data flow the event fired —
// "message_text", "tool_result", "tool_call_args",
// "system_prompt", etc.
Source string `json:"source,omitempty"`
// TokenLen is the length of the offending token (or chars
// changed for unicode_sanitize). Length only, never the bytes.
TokenLen int `json:"token_len,omitempty"`
}
// AuditLogger appends AuditEvent records to a per-session JSON Lines
// file. Safe for concurrent use. Writes are skipped while incognito
// mode is active so the no-persistence contract is honoured.
//
// A nil *AuditLogger is a valid no-op — callers can use the same
// `audit.Record(...)` shape whether or not auditing is configured.
type AuditLogger struct {
path string
incognito *IncognitoMode
logger *slog.Logger
mu sync.Mutex
}
// AuditLoggerConfig controls how AuditLogger is constructed.
type AuditLoggerConfig struct {
// Path is the full filesystem path to write JSONL events to.
// Parent directories are created lazily on first successful Record.
Path string
// Incognito gates writes; when active, Record is a no-op.
// Optional — pass nil to always persist.
Incognito *IncognitoMode
// Logger receives one Warn per write failure so the user sees
// disk-full / permission errors instead of silently losing
// audit records. Defaults to slog.Default() when nil.
Logger *slog.Logger
}
// NewAuditLogger builds an AuditLogger. Pass a zero Path to disable
// auditing (returns nil).
func NewAuditLogger(cfg AuditLoggerConfig) *AuditLogger {
if cfg.Path == "" {
return nil
}
logger := cfg.Logger
if logger == nil {
logger = slog.Default()
}
return &AuditLogger{
path: cfg.Path,
incognito: cfg.Incognito,
logger: logger,
}
}
// Record appends an event to the audit log. Safe to call on a nil
// receiver (no-op). Skipped silently when incognito is active.
// Write failures are logged at Warn level but do not propagate to
// the caller — auditing is best-effort and must not crash the
// scanner pipeline.
func (a *AuditLogger) Record(ev AuditEvent) {
if a == nil {
return
}
if a.incognito != nil && a.incognito.Active() {
return
}
if ev.Timestamp.IsZero() {
ev.Timestamp = time.Now().UTC()
}
a.mu.Lock()
defer a.mu.Unlock()
if err := os.MkdirAll(filepath.Dir(a.path), 0o700); err != nil {
a.logger.Warn("audit: mkdir failed", "path", a.path, "err", err)
return
}
f, err := os.OpenFile(a.path, os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0o600)
if err != nil {
a.logger.Warn("audit: open failed", "path", a.path, "err", err)
return
}
defer f.Close()
if err := json.NewEncoder(f).Encode(ev); err != nil {
a.logger.Warn("audit: encode failed", "path", a.path, "err", err)
}
}
// Path returns the file path the logger writes to. Empty when the
// logger is disabled (nil receiver returns "").
func (a *AuditLogger) Path() string {
if a == nil {
return ""
}
return a.path
}
+139
View File
@@ -0,0 +1,139 @@
package security
import (
"bufio"
"encoding/json"
"os"
"path/filepath"
"strings"
"testing"
)
func readAuditLines(t *testing.T, path string) []AuditEvent {
t.Helper()
f, err := os.Open(path)
if err != nil {
t.Fatalf("open audit log: %v", err)
}
defer f.Close()
var events []AuditEvent
sc := bufio.NewScanner(f)
for sc.Scan() {
var ev AuditEvent
if err := json.Unmarshal(sc.Bytes(), &ev); err != nil {
t.Fatalf("decode line %q: %v", sc.Text(), err)
}
events = append(events, ev)
}
if err := sc.Err(); err != nil {
t.Fatalf("scan audit log: %v", err)
}
return events
}
func TestAuditLogger_NilReceiverIsNoop(t *testing.T) {
var a *AuditLogger
// Must not panic.
a.Record(AuditEvent{Action: "block"})
}
func TestAuditLogger_DisabledWhenPathEmpty(t *testing.T) {
a := NewAuditLogger(AuditLoggerConfig{})
if a != nil {
t.Errorf("expected nil logger for empty path, got %v", a)
}
}
func TestAuditLogger_AppendsJSONLines(t *testing.T) {
dir := t.TempDir()
path := filepath.Join(dir, "audit.jsonl")
a := NewAuditLogger(AuditLoggerConfig{Path: path})
if a == nil {
t.Fatal("expected non-nil logger")
}
a.Record(AuditEvent{Action: "block", Pattern: "anthropic_api_key", Source: "tool_result", TokenLen: 51})
a.Record(AuditEvent{Action: "redact", Pattern: "high_entropy", Source: "message_text", TokenLen: 42})
events := readAuditLines(t, path)
if len(events) != 2 {
t.Fatalf("expected 2 events, got %d", len(events))
}
if events[0].Action != "block" || events[0].Pattern != "anthropic_api_key" {
t.Errorf("event 0 = %+v", events[0])
}
if events[0].Timestamp.IsZero() {
t.Error("event 0 missing timestamp")
}
if events[1].Action != "redact" || events[1].TokenLen != 42 {
t.Errorf("event 1 = %+v", events[1])
}
}
func TestAuditLogger_SkipsUnderIncognito(t *testing.T) {
dir := t.TempDir()
path := filepath.Join(dir, "audit.jsonl")
incog := NewIncognitoMode()
a := NewAuditLogger(AuditLoggerConfig{Path: path, Incognito: incog})
incog.Activate()
a.Record(AuditEvent{Action: "block", Pattern: "x"})
if _, err := os.Stat(path); !os.IsNotExist(err) {
t.Errorf("expected audit file to not exist under incognito, got err=%v", err)
}
incog.Deactivate()
a.Record(AuditEvent{Action: "block", Pattern: "y"})
events := readAuditLines(t, path)
if len(events) != 1 {
t.Fatalf("expected 1 event after deactivate, got %d", len(events))
}
if events[0].Pattern != "y" {
t.Errorf("expected pattern=y (incognito event dropped), got %q", events[0].Pattern)
}
}
func TestAuditLogger_CreatesParentDir(t *testing.T) {
dir := t.TempDir()
path := filepath.Join(dir, "deeply", "nested", "audit.jsonl")
a := NewAuditLogger(AuditLoggerConfig{Path: path})
a.Record(AuditEvent{Action: "block"})
if _, err := os.Stat(path); err != nil {
t.Errorf("expected audit file at %s, got err=%v", path, err)
}
}
func TestFirewall_RecordsRedactionToAudit(t *testing.T) {
dir := t.TempDir()
auditPath := filepath.Join(dir, "audit.jsonl")
audit := NewAuditLogger(AuditLoggerConfig{Path: auditPath})
fw := NewFirewall(FirewallConfig{
ScanOutgoing: true,
ScanToolResults: true,
Audit: audit,
})
// Anthropic key prefix is a built-in redact pattern; emit it
// through the tool-result scanning path.
cleaned := fw.ScanToolResult("here is the key sk-ant-abcdef1234567890abcdef1234567890abcdef")
if !strings.Contains(cleaned, "[REDACTED]") {
t.Errorf("expected [REDACTED] in cleaned content, got %q", cleaned)
}
events := readAuditLines(t, auditPath)
var sawAnthropicRedact bool
for _, ev := range events {
if ev.Action == "redact" && ev.Pattern == "anthropic_api_key" && ev.Source == "tool_result" {
sawAnthropicRedact = true
if ev.TokenLen == 0 {
t.Errorf("expected non-zero TokenLen on redact event, got %+v", ev)
}
}
}
if !sawAnthropicRedact {
t.Errorf("expected an anthropic_api_key redact event in audit log, got %+v", events)
}
}
+36
View File
@@ -14,6 +14,7 @@ type Firewall struct {
scanner *Scanner
incognito *IncognitoMode
logger *slog.Logger
audit *AuditLogger // optional; nil = no per-session audit log
// Config
scanOutgoing bool
@@ -27,6 +28,11 @@ type FirewallConfig struct {
EntropyThreshold float64
EntropySafelist []string
Logger *slog.Logger
// Audit is the optional per-session audit logger. Set via
// SetAudit after the session ID is known — the firewall is
// typically constructed before the session ID is generated.
// nil is safe; auditing simply turns into a no-op.
Audit *AuditLogger
}
func NewFirewall(cfg FirewallConfig) *Firewall {
@@ -50,11 +56,20 @@ func NewFirewall(cfg FirewallConfig) *Firewall {
scanner: scanner,
incognito: NewIncognitoMode(),
logger: logger,
audit: cfg.Audit,
scanOutgoing: cfg.ScanOutgoing,
scanToolResults: cfg.ScanToolResults,
}
}
// SetAudit attaches an AuditLogger after construction. The firewall
// is typically built before the session ID exists, so callers usually
// construct the AuditLogger later and inject it via this setter.
// Pass nil to disable auditing.
func (f *Firewall) SetAudit(a *AuditLogger) {
f.audit = a
}
// Incognito returns the incognito mode controller.
func (f *Firewall) Incognito() *IncognitoMode {
return f.incognito
@@ -131,7 +146,16 @@ func (f *Firewall) scanMessage(m message.Message) message.Message {
func (f *Firewall) scanAndRedact(content, source string) string {
// Unicode sanitization first
originalLen := len(content)
content = SanitizeUnicode(content)
if delta := originalLen - len(content); delta != 0 {
f.audit.Record(AuditEvent{
Action: "unicode_sanitize",
Pattern: "unicode",
Source: source,
TokenLen: delta,
})
}
// Secret scanning
matches := f.scanner.Scan(content)
@@ -146,6 +170,12 @@ func (f *Firewall) scanAndRedact(content, source string) string {
"pattern", m.Pattern,
"source", source,
)
f.audit.Record(AuditEvent{
Action: "block",
Pattern: m.Pattern,
Source: source,
TokenLen: m.End - m.Start,
})
return "[BLOCKED: content contained a secret]"
default:
f.logger.Debug("secret redacted",
@@ -153,6 +183,12 @@ func (f *Firewall) scanAndRedact(content, source string) string {
"action", m.Action,
"source", source,
)
f.audit.Record(AuditEvent{
Action: string(m.Action),
Pattern: m.Pattern,
Source: source,
TokenLen: m.End - m.Start,
})
}
}
+23 -1
View File
@@ -1403,6 +1403,28 @@ func (m Model) handleCommand(cmd string) (tea.Model, tea.Cmd) {
m.injectSystemContext(msg)
return m, nil
case "/router":
if m.config.Router == nil {
m.messages = append(m.messages, chatMessage{role: "error", content: "router not configured"})
return m, nil
}
if args == "" || args == "help" {
current := m.config.Router.PreferPolicy().String()
m.messages = append(m.messages, chatMessage{role: "system",
content: fmt.Sprintf("router.prefer = %s\nUsage: /router <auto|local|cloud>\n auto — no bias; tier order + Strengths decide\n local — cloud arms demoted; locals win when feasible\n cloud — local arms demoted; cloud arms win (except tier-0 SLM)", current)})
return m, nil
}
policy, err := router.ParsePreferPolicy(args)
if err != nil {
m.messages = append(m.messages, chatMessage{role: "error", content: err.Error()})
return m, nil
}
m.config.Router.SetPreferPolicy(policy)
msg := fmt.Sprintf("router.prefer = %s (runtime override; not written to config)", policy.String())
m.messages = append(m.messages, chatMessage{role: "system", content: msg})
m.injectSystemContext(msg)
return m, nil
case "/profile":
if args == "" {
m = m.closeAllPickers()
@@ -1532,7 +1554,7 @@ func (m Model) handleCommand(cmd string) (tea.Model, tea.Cmd) {
return m, nil
}
m.messages = append(m.messages, chatMessage{role: "system",
content: "Commands:\n /init generate or update AGENTS.md project docs\n /clear, /new clear chat and start new conversation\n /config show current config\n /incognito toggle incognito (Ctrl+X)\n /keys show keyboard shortcuts\n /model [name] list/switch models\n /permission [mode] set permission mode (Shift+Tab to cycle)\n /plugins list installed plugins\n /profile [name] list profiles / switch (re-execs gnoma)\n /provider show current provider\n /replay scroll to top to re-read conversation\n /resume [id] list or restore saved sessions\n /shell [cmd] open interactive shell (or run cmd in shell)\n /skills list loaded skills\n /usage show token usage and cost\n /help show this help\n /quit exit gnoma\n\nSkills (use /<name> [args] to invoke):\n Add .md files with YAML front matter to .gnoma/skills/ or ~/.config/gnoma/skills/"})
content: "Commands:\n /init generate or update AGENTS.md project docs\n /clear, /new clear chat and start new conversation\n /config show current config\n /incognito toggle incognito (Ctrl+X)\n /keys show keyboard shortcuts\n /model [name] list/switch models\n /permission [mode] set permission mode (Shift+Tab to cycle)\n /plugins list installed plugins\n /profile [name] list profiles / switch (re-execs gnoma)\n /provider show current provider\n /replay scroll to top to re-read conversation\n /resume [id] list or restore saved sessions\n /router [mode] show or set routing preference (auto/local/cloud)\n /shell [cmd] open interactive shell (or run cmd in shell)\n /skills list loaded skills\n /usage show token usage and cost\n /help show this help\n /quit exit gnoma\n\nSkills (use /<name> [args] to invoke):\n Add .md files with YAML front matter to .gnoma/skills/ or ~/.config/gnoma/skills/"})
return m, nil
case "/keys":
+35 -5
View File
@@ -22,7 +22,10 @@ var builtinCommands = []cmdEntry{
{"/exit", "exit gnoma"},
{"/help", "show available commands and shortcuts"},
{"/incognito", "toggle incognito mode (no persistence, local-only routing)"},
{"/init", "initialize project — create AGENTS.md"},
// /init is provided by the bundled skill at
// internal/skill/skills/init.md; do not duplicate it here. The dedup
// in completionSource() would skip a duplicate entry anyway, but
// omitting it keeps the source-of-truth single.
{"/keys", "show keyboard shortcuts"},
{"/model", "list or switch active model"},
{"/new", "start a new conversation"},
@@ -34,6 +37,7 @@ var builtinCommands = []cmdEntry{
{"/quit", "quit gnoma"},
{"/replay", "replay last assistant response"},
{"/resume", "browse and resume a saved session"},
{"/router", "show or set routing preference (auto/local/cloud)"},
{"/shell", "open interactive shell"},
{"/theme", "list themes or set active theme"},
{"/skills", "list available skills"},
@@ -46,11 +50,27 @@ var permissionModes = []string{
"auto", "default", "accept_edits", "bypass", "deny", "plan",
}
// completionSource builds a sorted command list from builtins + skills.
func completionSource(skills *skill.Registry) []cmdEntry {
entries := make([]cmdEntry, len(builtinCommands))
copy(entries, builtinCommands)
// routerPreferModes lists valid values for /router completion.
var routerPreferModes = []string{"auto", "local", "cloud"}
// completionSource builds a sorted command list from builtins + skills.
// Skill names shadow builtin names so a skill (bundled or user-defined)
// can replace a static entry without producing a duplicate in the picker.
func completionSource(skills *skill.Registry) []cmdEntry {
skillNames := make(map[string]struct{})
if skills != nil {
for _, s := range skills.All() {
skillNames["/"+s.Frontmatter.Name] = struct{}{}
}
}
entries := make([]cmdEntry, 0, len(builtinCommands)+len(skillNames))
for _, c := range builtinCommands {
if _, shadowed := skillNames[c.name]; shadowed {
continue
}
entries = append(entries, c)
}
if skills != nil {
for _, s := range skills.All() {
desc := s.Frontmatter.Description
@@ -150,6 +170,16 @@ func matchArgCompletion(input string, profileNames []string, providerNames []str
return cmd + " " + mode
}
}
case "/router":
if arg == "" {
return ""
}
lower := strings.ToLower(arg)
for _, mode := range routerPreferModes {
if strings.HasPrefix(mode, lower) && mode != arg {
return cmd + " " + mode
}
}
case "/profile":
if arg == "" || len(profileNames) == 0 {
return ""