- Add for-the-badge style shields (release, license, Go 1.26+, GHCR)
- Drop the "until the first tag is cut" line that's been stale since
v0.1.0 shipped on 2026-05-20
- Add a Vision / image input section covering Ctrl+V paste, literal
[Image: /path] markers, the 10 MiB cap, the incognito carve-out,
and the router's Vision capability gating
- Add a Subprocess sandbox bypass subsection under Providers
documenting GNOMA_AGY_BYPASS_PERMISSIONS and
GNOMA_CODEX_BYPASS_SANDBOX as deliberate footguns
- Add an Entropy false-positive reduction subsection under Security
showing the [security].entropy_safelist opt-in (Phase F-1) and
noting the per-pattern Debug telemetry that feeds F-2 gating
go mod tidy (triggered by GoReleaser's before hook) correctly
promoted both modules from indirect to direct: cloud.google.com/go/auth
is imported by internal/provider/google for the ADC credential
walk, and github.com/atotto/clipboard is imported by internal/tui
for image-paste handling. Listing them as direct reflects actual
usage and prevents tooling from suggesting their removal.
GoReleaser is phasing out the dockers + docker_manifests pair in
favour of dockers_v2, which collapses our four-block setup into
one. The migration also touches Dockerfile (per-platform binary
layout in the build context), so it's worth scheduling as its own
commit rather than a release-time rush.
GHCR's package page auto-links to a GitHub repo via the
org.opencontainers.image.source label. The previous value
pointed at the Gitea canonical (somegit.dev/Owlibou/gnoma),
which GHCR can't resolve — so the package page just showed a
"Link this package to a repository" prompt and contributors,
Readme, and discussions never auto-populated.
Swap the two URL labels: source now points at the GitHub
mirror, url keeps the Gitea canonical reference. Both arch
build blocks updated.
Takes effect on the next release (v0.2.0 images already shipped
with the old labels and stay as-is).
Add a deterministic pre-extractor that skips known-safe token shapes
before they reach the entropy scorer. Targets the false-positive
regime that bites under lowered entropy_threshold or
redact_high_entropy = true — UUIDs (~3.4 bits), SHA hex digests
(~3.9 bits), ISO-8601 timestamps, and HTTP(S) URLs.
Config knob lives under the existing security section to match
entropy_threshold / redact_high_entropy convention:
[security]
entropy_safelist = ["uuid", "sha_hex", "iso8601", "url"]
Empty / unset preserves pre-F-1 behaviour exactly — users opt in.
Per-pattern Debug telemetry fires on every skip (pattern name +
token length, never the token bytes). This is the data F-2's
go/no-go gate depends on; the plan literally specifies it.
NewFirewall validates names at the config boundary and emits a
Warn for unknown entries so a typo like "uid" instead of "uuid"
surfaces loudly instead of silently disabling FP reduction.
Tests cover: UUID/SHA-1/SHA-256 skipped at lowered threshold,
mixed payload (safe shape + real secret) preserves the secret,
secret-adjacent-to-UUID regression guard, empty safelist preserves
pre-F-1 behaviour, unknown name silently dropped at scanner level
but warned at firewall level, end-to-end FirewallConfig wiring,
and the skip-telemetry log line.
F-2 remains gated on real-workload FP-rate observations.
The worktree commit 12a6b83 dropped the "Native agy JSON output"
backlog item alongside removing the agy agent. Since we restored
agy in this branch, the TODO is relevant again — agy v1.0.0 still
emits plain text and the prompt-augmentation fallback should be
replaced by --output-format stream-json once the CLI supports it.
Switch TestTryLoadOAuthCredentials_Formats to t.TempDir() to drop
the unchecked os.RemoveAll defer that golangci-lint's errcheck
caught after the merge.
Brings in the Google auth precedence work (agy > gemini > ADC
credential walk, fileTokenProvider expiry handling, slog-backed
error reporting), the Codex CLI integration as a new subprocess
agent, and the restoration of the agy subprocess agent that was
accidentally removed by the initial codex commit. Sandbox-bypass
flags on both agy and codex are now opt-out via env vars
(GNOMA_AGY_BYPASS_PERMISSIONS, GNOMA_CODEX_BYPASS_SANDBOX).
Includes review-driven fixes:
- ADC fallback now uses real DetectOptions (cloud-platform scope)
- fileTokenProvider returns an error on expired tokens instead
of shipping a known-dead bearer
- TestNew_Precedence asserts which credential was actually picked
- codex parser tolerates non-JSON banner / debug lines on stdout
- codex usage takes max(input_tokens, prompt_tokens) so accounting
can't silently undercount
No conflicts expected with the dev image-content feature: the
worktree branch only touches the google and subprocess provider
families.
The subprocess CLI table only mentioned three agents; the full set
now is claude, gemini, agy, codex, and vibe (Mistral). Bring the
documentation in line with knownAgents.
The original commit on this branch replaced the agy subprocess agent
with codex (overwriting the slot in knownAgents, deleting agy_test.go
and the agyParser). That was unintentional — agy (antigravity) is a
distinct CLI from codex (OpenAI's). Antigravity will replace gemini
when gemini retires on 2026-06-16, so it needs to keep its own slot.
Restored: FormatAgyText constant, agyParser with newAgyParser and
the line-delimited text parser, the agy CLIAgent entry in
knownAgents with PromptResponseFormat:true, agy_test.go, and the
agy case in newParser. Sourced from the parent commit so behavior
matches what shipped before the codex change.
Sandbox bypass: both agy (--dangerously-skip-permissions) and codex
(--dangerously-bypass-approvals-and-sandbox) need a flag to run
non-interactively (their stdin is closed; without it they block on
approval prompts nobody can answer). Both default to ON for
out-of-box behavior; operators with pre-approved trust config can
opt out via GNOMA_AGY_BYPASS_PERMISSIONS=0 or
GNOMA_CODEX_BYPASS_SANDBOX=0. Tests cover the on / opt-out / unknown
value branches.
TestKnownAgents_ValidFormats updated to accept the restored
FormatAgyText.
Codex emits banner / debug / "starting turn" lines to stdout
interleaved with the JSON event stream. The parser previously
returned an error on any line that wasn't a JSON object, which
subprocessStream.Next treats as terminal — one stray banner
aborted the whole turn. Skip lines that don't start with `{`
after whitespace trim, and downgrade unparseable JSON-looking
lines to a slog.Debug so they don't kill the stream either.
Token accounting: usage payloads from newer codex builds
occasionally carry both input_tokens and prompt_tokens (and
likewise output / completion) with slightly different values.
Always use the larger of the two so we can't silently undercount.
Tests cover non-JSON banner skipping, malformed-JSON
non-fatal-skip, and the max() behavior with both token
fields populated.
credentials.DetectDefault(nil) always returns "options must be
provided", which made the ADC branch unreachable. Pass an explicit
DetectOptions with the cloud-platform scope so users with
GOOGLE_APPLICATION_CREDENTIALS or `gcloud auth application-default
login` actually flow through ADC instead of falling out as
"no credentials found".
fileTokenProvider.Token used to return expired tokens unchanged.
We don't perform an OAuth refresh exchange (the upstream CLI does
that out-of-band into the file we read), so when the file isn't
fresh the only safe move is to fail loudly with an actionable
message rather than ship a known-dead bearer that genai forwards
to Vertex AI and gets back a confusing 401.
tryLoadOAuthCredentials previously swallowed all errors equally,
so the precedence walker silently skipped past misconfigured files
(chmod 0600 on the wrong user, half-written JSON, etc.). Now
os.IsNotExist is silent (normal walking), everything else gets a
slog.Warn with the path so an unreadable file is visible.
selectOAuthCredentials extracts the precedence chain into a
testable helper that also returns a CredentialSource tag
identifying which path was chosen. The previous precedence test
only asserted err == nil; the new test verifies that the agy file
wins when both are present and that the fallback to gemini
actually loads the gemini token.
Pasted images, pasted text, and tool-read files all carry the same
risk class (screenshots with API keys, terminal pastes with creds,
.env reads). Today these are handled inconsistently — incognito
gates persistence but not provider egress, the outgoing-scan
firewall is text-only. Note the cross-cut with Phase F entropy
work and the firewall path so this isn't lost.
Ctrl+V image paste used to write the file to .gnoma/pasted_image_*.png
under the project root, which polluted the workdir and risked
committing screenshots that may contain sensitive content.
Now writes to os.UserCacheDir() / gnoma / pasted-images/ (XDG cache
on Linux, ~/Library/Caches on macOS, %LocalAppData% on Windows).
The directory is created at 0700 and files at 0600 since pasted
content can be sensitive.
Each paste prunes entries older than 2 hours best-effort, so the
cache doesn't accumulate across sessions. The 2h window safely
covers any single turn including provider retries and slow
subprocess CLIs that need the file to still exist on disk when
they ingest the path.
.gitignore: cover the legacy `.gnoma/pasted_image_*` location for
old checkouts; add log.txt and codex_out.jsonl which were tracked
as runtime artifacts during the recent work.
Tests cover cache-path placement, restrictive perms on both the
directory and the file, the no-pollution-of-cwd invariant, and the
prune behavior (stale removed, fresh kept, missing dir no-op).
When the user message has at least one ImageContent block, build a
ChatCompletionContentPartUnionParam array with text + image_url
parts instead of the string content path. Image bytes are inlined
as a base64 data URL (data:<media-type>;base64,...). Adjacent text
blocks are merged into a single TextContentPart. Pure-text user
messages stay on the existing string fast path.
This covers OpenAI direct + every openaicompat backend (Ollama,
llama.cpp, llamafile) since they all share the same provider.
Tests: pure text uses OfString; image present emits 2 content parts
(text + image_url with the expected base64 payload); nil-Image
blocks are dropped and adjacent text merges correctly.
buildUserMessage replaces the unconditional NewUserText wrap inside
SubmitWithOptions. When the active model advertises Vision and the
input contains [Image: /path] markers, the markers are inlined as
ImageContent blocks carrying the file bytes; otherwise the input is
passed through as a single text block (legacy behavior preserved
for subprocess CLIs that auto-ingest paths, e.g. gemini-cli).
image_input.go:
- imageMarkerRe extracts each [Image: ...] occurrence.
- Per marker: validates absolute path, file (not dir), size cap of
10 MiB, image/* media type via http.DetectContentType.
- On any validation failure, the marker is left as literal text and
a warning is recorded — the turn still proceeds.
Routing: latestUserHasImages drives task.RequiresVision in both the
primary stream attempt and the retryOnTransient path, so failover
arms also respect the vision requirement.
Tests cover: no markers (single text block), single image
(bytes captured into Image.Data, MediaType set), missing file
(literal fallback + warning), relative path rejection, oversized
rejection, non-image file rejection, multiple images interleaved
with text.
Task gains a RequiresVision bool; filterFeasible enforces it on
both the primary feasibility pass and the last-resort fallback
(no degradation to a non-vision arm — the model literally cannot
consume image bytes).
Ollama discovery now probes /api/show for vision capability:
- details.families containing "clip" / "mllama" / "*vl"
- capabilities array containing "vision" (newer Ollama)
- name-prefix fallback for releases that predate either
(llava, qwen2.5-vl, llama3.2-vision, moondream, pixtral, etc.)
OllamaProbeResult replaces the map[string]bool tool cache so the
single /api/show call can populate tools + vision + ctx-size in
one probe. DiscoverOllama / DiscoverLocalModels signatures updated;
nil-cache callers in cmd/gnoma keep working unchanged.
RegisterDiscoveredModels propagates SupportsVision into the arm's
Capabilities.Vision.
Tests cover RequiresVision filtering in both the happy path
(vision-only arm chosen when image present) and the fallback path
(non-vision arm rejected even as last resort).
Extends the Content discriminated union with a fifth variant for
inline image payloads. Image carries the raw bytes (captured at
user-input time so the message snapshot is self-contained and
survives source-file deletion), the IANA media type for the
provider's image part, and the original path for logging.
HasImages() lets providers decide whether to fall back to a
text-only representation; providers that don't know about
ContentImage will simply skip those blocks via TextContent().
Bundles the pending TUI work into a coherent batch. Bug fixes from
external review:
* expandPlaceholders: single-pass alternation regex over the original
input prevents `#p\d+` / `#img\d+` tokens inside pasted content from
being re-expanded after the bracket form is inlined.
* /incognito: gate savePromptHistory and the Ctrl+V image-write branch
on `!m.incognito` so the no-persistence contract holds.
* history.txt: write at mode 0600 (chmod existing 0644 files), create
parent dir at 0700, truncate to 500 entries on every save, slog.Warn
on errors instead of swallowing.
* triggerPickerAction: guard m.config.Engine before SetModel, matching
the /model handler.
* Picker key handler: navigation/enter/q consume, escape/ctrl+c close
the picker AND fall through to global handlers (so streaming cancel
and double-tap quit work with an overlay open), default swallows
stray input.
* Paste line count: report total non-empty lines instead of newline
count, ignoring trailing newlines (no more "+0 lines" for "abc").
* Ctrl+O restored to expand-output; Ctrl+Y is the new copy-response
bind. /keys help text updated; picker help entries reordered.
* Tighter perms on .gnoma/pasted_image_*.png (0600).
Race-safety refactor: ApplyTheme used to mutate ~25 package-level
lipgloss styles in place. Replaced with an immutable themeStyles
snapshot and atomic.Pointer[themeStyles] swap. Readers go through a
theme() helper (one atomic load) instead of touching package vars
directly. No locks, no nested-RLock risk if rendering ever moves
off-thread.
Includes pre-existing in-flight work: TUISection in config with
persistent theme/vim settings; /copy /theme /vim slash commands;
provider-name completion; session.SetProvider for the provider picker.
Tests: placeholder_test.go (6 regression + happy-path cases including
the pasted-content collision), history_test.go (5 cases covering perms
on new and existing files, on-disk truncation, blank-input, newline
flattening), provider_test.go (provider switching + picker transitions
+ SLM gating).
Move Distribution out of "In flight" — v0.1.0 shipped: archives on
github.com/VikingOwl91/gnoma/releases and ghcr.io/vikingowl91/gnoma
multi-arch images. Capture remaining optional improvements (Homebrew
tap, curl|sh installer, signed checksums, Windows process-tree kill
via job objects) as follow-ups so they're not lost.
`internal/mcp/transport.go` used syscall.Setpgid and syscall.Kill
unconditionally, both Unix-only. Split the platform bits into
`transport_unix.go` (build tag `!windows`) keeping the existing
process-group semantics, and `transport_windows.go` (build tag
`windows`) falling back to `os.Process.Kill` (kills only the
immediate process — full process-tree kill on Windows would need
golang.org/x/sys/windows + job objects, deferred).
Caught by `goreleaser release --snapshot` cross-compiling for
windows/amd64 and windows/arm64.
- Switch GoReleaser archive release target from Gitea to the GitHub
mirror (VikingOwl91/gnoma). Pre-built archives now publish to
github.com/VikingOwl91/gnoma/releases on each tag.
- README: drop ANTHROPIC_API_KEY-as-example from Quickstart and the
Docker run example; both now reference the Providers table for the
full env-var list. Pre-built binary section points solely at the
GitHub releases page.
- NOTICE: correct copyright holder to vikingowl.
Extend the existing GoReleaser pipeline (linux/darwin/windows ×
amd64/arm64 archives + Gitea release) with multi-arch Docker images
published to ghcr.io/vikingowl91/gnoma.
- Two per-arch `dockers:` blocks build linux/amd64 + linux/arm64 via
buildx, copying the GoReleaser-built static binary into a distroless
base (gcr.io/distroless/static:nonroot — no shell, runs as UID 65532,
ships with CA certs so HTTPS provider calls work).
- `docker_manifests:` stitch them into multi-arch :{Version} and
:latest manifests.
- OCI labels populated (title, source, url, version, revision, created,
licenses).
- Dockerfile uses /workspace as cwd so `docker run -v "$PWD:/workspace"`
mirrors the local invocation model.
Prerequisite: build host needs `docker login ghcr.io` with a PAT
holding `write:packages` for the vikingowl91 namespace.
Top-level docs were stale and the .gitea/ issue templates referenced a
workflow that is no longer in use.
- README: rewrite around the current feature set (SLM routing, profiles,
plugin TOFU, SafeProvider boundary, current model defaults). Add a
pre-built-binary install section plus Docker (ghcr.io) install path
for users without a Go toolchain. Document the GitHub mirror.
- CONTRIBUTING: drop the dead issue-template reference, note Gitea
upstream + GitHub mirror split, expand the package map and test-target
table.
- AGENTS: rebuild as a domain glossary (Elf / Arm / Turn / SafeProvider /
Incognito / Profile) plus non-obvious conventions an outside agent
needs and would not infer from the code.
- TODO: trim completed waves into a History section, fix a broken
link to the never-written Wave 3 plan file, surface active backlog.
- docs/essentials/INDEX: add ADR-004 (PostToolUse hook ordering) to the
ADR list.
- LICENSE + NOTICE: adopt Apache License 2.0. Patent grant matters
because gnoma bundles SDKs from Anthropic / OpenAI / Google / Mistral
and ships derivative tooling that runs untrusted MCP servers.
- Delete .gitea/issue_template/ and gemma-integration-analysis.md
(latter is obsolete per its own preamble — Node.js-specific notes
that don't apply to the Go implementation).
Bump hard-coded provider defaults to the May 2026 lineup:
- Anthropic: claude-sonnet-4-6 (default); Opus 4.7 and Haiku 4.5 in
the fallback list. 4.6/4.7 generation has 1M context standard.
- OpenAI: gpt-5.5 (default); 5.5-pro / 5.2 / 5.2-chat-latest in
fallback. ThinkingModes now baseline on GPT-5.x.
- Google: gemini-3.5-flash (default); 3.1 Pro / Flash Lite in fallback.
- Mistral: mistral-large-latest unchanged (Mistral Large 3); add
mistral-medium-3.5, mistral-medium-2511, mistral-large-2512 to the
rate-limit map.
Legacy dated IDs retained in fallback lists and ratelimits maps so
configs pinned to claude-sonnet-4-20250514 / gpt-4o / gemini-2.5-flash
keep resolving. Capability tables (ContextWindow, MaxOutput,
ThinkingModes) updated to match each generation. CLI help text in
cmd/gnoma/main.go also updated.
Apply gofmt -w across the codebase (struct field comment realignment
only — no semantic changes) and silence two errcheck warnings on
fmt.Sscanf / fmt.Fprintf return values in internal/router/discovery
with explicit `_, _ =` discards. Required so `make check` is green
before tagging v0.1.0.
When a stream errors out before producing any user-visible content
(text, thinking, or tool calls), the engine now transparently retries
on the next-best arm instead of bubbling the error to the TUI. Covers
the case from the post-SLM screenshot: subprocess CLI agents that
exit non-zero on auth/config failures, network drops mid-stream,
rate-limited arms whose error surfaces after Stream() already returned.
Mechanism: the stream-create + consume blocks are wrapped in a labeled
streamLoop. On s.Err() != nil with empty accumulator, the engine emits
a new EventFailover ("↻ <failed_arm> failed (<reason>) — retrying on
another arm"), excludes the failed arm via task.ExcludedArms, and
re-enters the loop. Cap of 4 failovers per round.
Guards:
- !acc.HasContent() — if text/tool calls already streamed, fail loud
rather than duplicate visible output on retry.
- isFailoverable(err) — deny-list approach: context.Canceled/Deadline
and HTTP 400/413 are fatal; everything else (auth, rate limit, 5xx,
subprocess exit, network) is failoverable.
- Router.ForcedArm() == "" — when the user pinned an arm via --provider,
failover is disabled by design.
- failoverAttempt < maxFailovers — bounded retry budget.
TUI renders EventFailover under the existing "cost" role styling.
shortFailReason strips the subprocess wrapper envelope so the user sees
"Invalid API key. Try again." instead of
"subprocess: exit status 1: Error: Invalid API key. Try again.".
Tests cover the classifier (isFailoverable, shortFailReason), end-to-end
auth-error failover, content-already-streamed guard, and context-cancel
guard. Deterministic across 10x -race runs by giving the failing arm
IsCLIAgent=true to anchor it in tier 0 ahead of the API-tier backup.
The router.SecureProvider interface previously required a public
IsSecure() bool method. Any test mock — or future production type —
could satisfy it by returning true, defeating the W1 "only wrapped
providers may flow past the boundary" contract through convention
rather than at the type level.
Replaces IsSecure() bool with an unexported security.Marker interface
that has a single secured() method. Go's method-set semantics key
unexported methods by their defining package, so only types declared in
internal/security can satisfy Marker. *SafeProvider gets the lone
secured() implementation; router.SecureProvider embeds Marker.
The seal forces every test mock that previously implemented IsSecure()
to either (a) be wrapped with security.WrapProvider(mp, nil) at the use
site, or (b) drop the method entirely if the mock never flows through
SecureProvider. 93 use sites across 11 test files were updated via a
per-package secureMock helper. WrapProvider with a nil firewall ref is
a no-op pass-through, so test behavior is unchanged.
Empirically: a type from outside internal/security can declare
`secured()` but the compiler will reject assigning it to
router.SecureProvider because the unexported method belongs to the
other package's namespace. Convention → compile-time guarantee.
3c87527 added engine/paths.go:resolveCanonical, duplicating the
ancestor-walk + EvalSymlinks algorithm that already lived in
fs/guard.go:ResolveWrite. Two implementations of the same TOCTOU defense
is exactly the wrong shape for security code — a bug fix in one would
silently miss the other.
Extracts the shared algorithm to security.CanonicalizePath. Both call
sites become thin wrappers that pre-anchor relative paths against the
appropriate root (cwd for engine, workspace root for guard). The
"hit-root" defensive branch in engine's version (commented "highly
unlikely") is tightened to match guard's error behavior.
Adds focused unit tests for the helper covering existing path,
non-existent leaf, non-existent mid-component, symlinked ancestor, and
relative-path rejection.
3c87527 rewrote DiscoverLlamaCPP to hit /props and emit a single hardcoded
"default" entry. That breaks two cases:
1. Multi-model llama.cpp deployments (llama-swap, model-routing proxies)
are collapsed to a single arm with a placeholder ID.
2. Single-model deployments lose the real model name — arms are
registered as llamacpp/default instead of llamacpp/<actual-id>.
Restores enumeration via /v1/models (the OpenAI-compatible endpoint
llama-server exposes) while keeping the concrete n_ctx read from /props.
/props is now best-effort: failure or missing n_ctx falls back to the
documented default rather than aborting discovery.
Adds three tests: multi-model enumeration with shared context, /props
unreachable, and the empty-/v1/models error path.
3c87527 refactored DiscoverOllama and DiscoverLlamaCPP and dropped two
behaviors:
1. The Ollama toolCache prune loop. Without it, the cache grows
unbounded across reconcile cycles and stale entries linger; a
model that disappears and reappears replays an out-of-date
tool-support verdict because the cache hit skips re-probing.
2. Sensible context-size defaults. Both probes can yield
ContextSize=0 (Ollama: no num_ctx in /api/show parameters;
llama.cpp: /props default_generation_settings without n_ctx).
Registering an arm with ContextWindow=0 misroutes — the post-SLM
two-stage path treats it as a tiny model.
Restores the prune loop, applies 32768 (ollama) / 8192 (llama.cpp) as
fallbacks at discovery time, and adds three tests covering each path.
The full-block private_key regex (BEGIN…END span) added in 3c87527 fails
to match when the END marker is missing — log slices, buffered streams,
or partial dumps that contain only the header and key body would leak
the body. Adds a private_key_header pattern that matches the header
plus the trailing base64 body. Redact merges the overlapping spans into
a single placeholder when both fire on a complete block.
Covered by TestScanner_DetectsTruncatedPrivateKey (no END marker) and
TestRedact_PrivateKeyOverlap_SinglePlaceholder (overlap merge).
Today an arm's stream error (auth, rate limit, subprocess exit) kills
the turn. Backlog item to retry on the next-best arm for the task type
and surface a one-line hint to the user.
- Drop unverified JSONOutput/Vision capability claims on agy (no native
stream-json, no image-input path on v1.0.0).
- Replace agent.Name == "agy" check with PromptResponseFormat flag on
CLIAgent so the prompt-augmented JSON fallback scales to future agents.
- Pass --dangerously-skip-permissions in agy PromptArgs to parallel
gemini --yolo / vibe --trust; required for non-interactive runs.
- Nil-guard JSONSchema and Schema bytes in buildPrompt (previously
panicked when ResponseJSON was requested without a schema).
- Rename misleading TestAgyProvider_StreamAugmentation to
TestAgyParser_EmitsLineDeltas; add coverage for nil-schema path and
non-augmenting agents.
Waves 1-3 + ADR-004 are all merged; the 2026-05-19 external audit's
14 findings are closed. TODO.md no longer needs to track the in-
progress wave or scoped-but-not-drafted waves — they're all done.
Closes the last remaining 2026-05-19 audit finding by documenting the
existing transitive guarantee rather than restructuring the hook
contract.
The audit observed that PostToolUse hooks receive raw tool output
before the firewall scan runs, and proposed reordering or splitting
the event into raw-local-only and redacted-for-LLM variants. After
Wave 1 (SafeProvider boundary at every router arm + non-engine
provider consumer), the audit's threat model is closed transitively:
- Shell hooks see raw output but never reach an LLM.
- Prompt hooks route Stream calls through routerStreamer → router →
arm.Provider, every arm.Provider is now *SafeProvider, outgoing
messages are scanned at the boundary.
- Agent hooks spawn an elf whose engine has Firewall set;
buildRequest scans inline.
Reordering would regress legitimate shell-hook use cases (audit,
forensic, local alert) that need raw access. Splitting the contract
forces every existing hook config to migrate and introduces a
wrong-variant footgun. Neither is justified by the residual risk.
Three changes ship with the ADR:
- ADR-004 records the decision and the conditions for re-opening it.
- Doc comments on hook.PostToolUse and the dispatcher call site in
the engine point at the ADR.
- internal/hook/posttooluse_redaction_test.go locks in the invariant:
a prompt PostToolUse hook firing on a secret-bearing tool result
produces a redacted prompt at the inner provider. If this test
fails, ADR-004's Position A is no longer correct and the audit
finding re-opens.
Closes the cluster of audit findings where gnoma's incognito promise
('no persistence, no learning, local-only routing') silently broke
because state was duplicated across the CLI flag, the firewall's
IncognitoMode, the router's localOnly flag, and the TUI's local
m.incognito field. Wave 2 makes security.IncognitoMode the canonical
source of truth.
W2-1 Router.Select rejects forced non-local arms when localOnly is on
rather than short-circuiting and silently routing to cloud. Main
fails fast when --incognito + --provider <cloud> are combined; the
TUI toggle (Ctrl+X, /incognito, config panel) refuses with an
actionable message when a non-local arm is pinned. Factored the
three duplicated toggle sites into Model.attemptIncognitoToggle.
W2-2 persist.Store.Save consults an IncognitoGate (local interface,
*security.IncognitoMode satisfies it). nil gate = always persist
(legacy behaviour for tests); non-nil gate is consulted on every
Save so TUI runtime toggles take effect without reconstructing the
store. File mode 0o600, dir mode 0o700.
W2-3 tui.New seeds m.incognito from cfg.Firewall.Incognito().Active().
Fixes the Ctrl+X-on-launch-with-incognito case where the first
toggle silently turned the firewall OFF because the local flag
started false out of sync with the firewall.
W2-4 saveQuality gates on both *incognito (defensive, covers the
window before fwRef.Set fires) and fw.Incognito().ShouldLearn() (so
TUI Ctrl+X suppresses the snapshot on exit). Quality restore skipped
under --incognito. Quality file written 0o600 in dir 0o700.
engine.reportOutcome and elf.Manager.ReportResult both gate on
fw.Incognito().ShouldLearn() — bandit signal no longer leaks out of
incognito sessions.
W2-5 session files written 0o600 in dirs 0o700 (was 0o644 / 0o755).
W2-6 IncognitoMode.LocalOnly dropped — dead field with no readers;
routing local-only state lives on the router, not the firewall.
Also wires rtr.SetLocalOnly(true) when --incognito at launch — main
previously activated the firewall's flag but never told the router to
filter, so even without the forced-arm bug, launching with
--incognito alone gave you 'incognito badge but full arm pool'.
Plan for the second hardening wave. Six findings closed in one PR:
W2-1 router rejects forced non-local under local-only; W2-2 persist
store consults IncognitoMode + 0o600/0o700 perms; W2-3 TUI seeds
incognito from firewall; W2-4 quality/outcome gates read firewall
instead of CLI flag; W2-5 session perms 0o600; W2-6 remove dead
IncognitoMode.LocalOnly field.