The 'What makes gnoma different' bullet and Security section both
implied a network-egress firewall. Today the Firewall only enforces a
content boundary (secret scan, Unicode sanitize, redact/block). Reword
both spots and add a Scope note. Surface the gap as a top-of-TODO
entry covering per-session audit log and per-host egress allowlist,
with the open design question (host-level vs per-tool) called out.
Raised via r/SideProject v0.3.0 launch thread.
Three plans shipped end-to-end in v0.3.0; removing them from
TODO.md In-flight and adding a Status: shipped header to each
plan doc with the commit references.
Shipped:
- 2026-05-23-routing-defaults-refresh.md
- 2026-05-23-prefer-routing-policy.md
- 2026-05-23-startup-safety-banner.md
Still in flight (telemetry-gated, fires only if measurements
support it):
- 2026-05-23-tool-router-specialization.md
Two parallel pre-flight plans surfaced in the 2026-05-23 session,
both deferred while the routing-defaults-refresh implementation
landed. Drafted as separate plans because they're independent:
the prefer-policy is a router scoring change; the safety banner is
a launch-time check that never touches the router.
prefer-routing-policy
[router].prefer = "local" | "cloud" | "auto" — soft score
multiplier (0.3 / 0.5 / 1.0) biasing toward local or cloud arms
while preserving Strengths cross-tier promotion and bandit
learning. Default "auto" is byte-identical to current behavior.
Forced arms and incognito retain priority. CLI-agent subprocess
arms count as non-local for this knob (they proxy to cloud).
startup-safety-banner
Three-tier cwd classification at launch — refuse in /etc /sys
and other system roots; warn+keypress in $HOME, /tmp, ~/Desktop,
~/Downloads; OK inside any git repo or directory with a project
marker (.gnoma/, go.mod, package.json, etc.). Always shows a
context banner with cwd, git state, model, modes, and a
top-level sensitive-file inventory (.env, id_rsa, *.pem, .ssh/,
etc. — informational only, no recursion, capped at 1000 entries).
Bypass via --dangerously-allow-anywhere. Complements the in-flight
sensitive-content unified-policy TODO item: this is the pre-flight
layer, that is the runtime input-path layer.
Both plans default-on with safe defaults; both have explicit
out-of-scope sections to prevent scope creep during implementation.
Linux + macOS first; Windows path classification deferred.
TODO.md surfaces both as in-flight.
Discovery previously registered every model returned by Ollama as a
chat arm, including embeddings, ASR, TTS, audio realtime, and
rerankers — which then failed at inference time when the router
selected them. Local arms also shipped with all-zero defaults, so
selection between e.g. tiny3.5:1.5b, phi-4:14b, and qwen3-coder:30b
was effectively random.
This change covers tasks R-1, R-2, R-6 from the routing-defaults plan.
- nonChatModelPatterns + isNonChatModel substring matcher; matched
IDs are skipped during RegisterDiscoveredModels. Covers whisper,
moonshine, kokoros, vibevoice, -asr, -tts, -audio, -embedding,
embeddinggemma, -reranker, lfm2.
- knownVisionModelPrefixes gains gemma4, gemma-4, glm-ocr. gemma3
and minicpm-v entries stay for regression coverage.
- New internal/router/defaults.go with FamilyDefaults struct,
knownFamilyDefaults map, and ResolveFamilyDefaults longest-prefix
lookup (with org/-namespace stripping so reecdev/tiny3.5:1.5b
resolves to "tiny3.5"). Single entry for now: functiongemma is
registered with Disabled=true and MaxComplexity=0.40, reserved for
the future ArmRoleToolRouter path. Table will grow in R-3.
- RegisterDiscoveredModels consults ResolveFamilyDefaults and only
populates fields that are still zero on the arm, so user [[arms]]
overrides keep priority.
Plans:
- docs/superpowers/plans/2026-05-23-routing-defaults-refresh.md
- docs/superpowers/plans/2026-05-23-tool-router-specialization.md
TODO.md surfaces both as in-flight items.
GoReleaser is phasing out the dockers + docker_manifests pair in
favour of dockers_v2, which collapses our four-block setup into
one. The migration also touches Dockerfile (per-platform binary
layout in the build context), so it's worth scheduling as its own
commit rather than a release-time rush.
Add a deterministic pre-extractor that skips known-safe token shapes
before they reach the entropy scorer. Targets the false-positive
regime that bites under lowered entropy_threshold or
redact_high_entropy = true — UUIDs (~3.4 bits), SHA hex digests
(~3.9 bits), ISO-8601 timestamps, and HTTP(S) URLs.
Config knob lives under the existing security section to match
entropy_threshold / redact_high_entropy convention:
[security]
entropy_safelist = ["uuid", "sha_hex", "iso8601", "url"]
Empty / unset preserves pre-F-1 behaviour exactly — users opt in.
Per-pattern Debug telemetry fires on every skip (pattern name +
token length, never the token bytes). This is the data F-2's
go/no-go gate depends on; the plan literally specifies it.
NewFirewall validates names at the config boundary and emits a
Warn for unknown entries so a typo like "uid" instead of "uuid"
surfaces loudly instead of silently disabling FP reduction.
Tests cover: UUID/SHA-1/SHA-256 skipped at lowered threshold,
mixed payload (safe shape + real secret) preserves the secret,
secret-adjacent-to-UUID regression guard, empty safelist preserves
pre-F-1 behaviour, unknown name silently dropped at scanner level
but warned at firewall level, end-to-end FirewallConfig wiring,
and the skip-telemetry log line.
F-2 remains gated on real-workload FP-rate observations.
The worktree commit 12a6b83 dropped the "Native agy JSON output"
backlog item alongside removing the agy agent. Since we restored
agy in this branch, the TODO is relevant again — agy v1.0.0 still
emits plain text and the prompt-augmentation fallback should be
replaced by --output-format stream-json once the CLI supports it.
Switch TestTryLoadOAuthCredentials_Formats to t.TempDir() to drop
the unchecked os.RemoveAll defer that golangci-lint's errcheck
caught after the merge.
Brings in the Google auth precedence work (agy > gemini > ADC
credential walk, fileTokenProvider expiry handling, slog-backed
error reporting), the Codex CLI integration as a new subprocess
agent, and the restoration of the agy subprocess agent that was
accidentally removed by the initial codex commit. Sandbox-bypass
flags on both agy and codex are now opt-out via env vars
(GNOMA_AGY_BYPASS_PERMISSIONS, GNOMA_CODEX_BYPASS_SANDBOX).
Includes review-driven fixes:
- ADC fallback now uses real DetectOptions (cloud-platform scope)
- fileTokenProvider returns an error on expired tokens instead
of shipping a known-dead bearer
- TestNew_Precedence asserts which credential was actually picked
- codex parser tolerates non-JSON banner / debug lines on stdout
- codex usage takes max(input_tokens, prompt_tokens) so accounting
can't silently undercount
No conflicts expected with the dev image-content feature: the
worktree branch only touches the google and subprocess provider
families.
Pasted images, pasted text, and tool-read files all carry the same
risk class (screenshots with API keys, terminal pastes with creds,
.env reads). Today these are handled inconsistently — incognito
gates persistence but not provider egress, the outgoing-scan
firewall is text-only. Note the cross-cut with Phase F entropy
work and the firewall path so this isn't lost.
Move Distribution out of "In flight" — v0.1.0 shipped: archives on
github.com/VikingOwl91/gnoma/releases and ghcr.io/vikingowl91/gnoma
multi-arch images. Capture remaining optional improvements (Homebrew
tap, curl|sh installer, signed checksums, Windows process-tree kill
via job objects) as follow-ups so they're not lost.
Top-level docs were stale and the .gitea/ issue templates referenced a
workflow that is no longer in use.
- README: rewrite around the current feature set (SLM routing, profiles,
plugin TOFU, SafeProvider boundary, current model defaults). Add a
pre-built-binary install section plus Docker (ghcr.io) install path
for users without a Go toolchain. Document the GitHub mirror.
- CONTRIBUTING: drop the dead issue-template reference, note Gitea
upstream + GitHub mirror split, expand the package map and test-target
table.
- AGENTS: rebuild as a domain glossary (Elf / Arm / Turn / SafeProvider /
Incognito / Profile) plus non-obvious conventions an outside agent
needs and would not infer from the code.
- TODO: trim completed waves into a History section, fix a broken
link to the never-written Wave 3 plan file, surface active backlog.
- docs/essentials/INDEX: add ADR-004 (PostToolUse hook ordering) to the
ADR list.
- LICENSE + NOTICE: adopt Apache License 2.0. Patent grant matters
because gnoma bundles SDKs from Anthropic / OpenAI / Google / Mistral
and ships derivative tooling that runs untrusted MCP servers.
- Delete .gitea/issue_template/ and gemma-integration-analysis.md
(latter is obsolete per its own preamble — Node.js-specific notes
that don't apply to the Go implementation).
When a stream errors out before producing any user-visible content
(text, thinking, or tool calls), the engine now transparently retries
on the next-best arm instead of bubbling the error to the TUI. Covers
the case from the post-SLM screenshot: subprocess CLI agents that
exit non-zero on auth/config failures, network drops mid-stream,
rate-limited arms whose error surfaces after Stream() already returned.
Mechanism: the stream-create + consume blocks are wrapped in a labeled
streamLoop. On s.Err() != nil with empty accumulator, the engine emits
a new EventFailover ("↻ <failed_arm> failed (<reason>) — retrying on
another arm"), excludes the failed arm via task.ExcludedArms, and
re-enters the loop. Cap of 4 failovers per round.
Guards:
- !acc.HasContent() — if text/tool calls already streamed, fail loud
rather than duplicate visible output on retry.
- isFailoverable(err) — deny-list approach: context.Canceled/Deadline
and HTTP 400/413 are fatal; everything else (auth, rate limit, 5xx,
subprocess exit, network) is failoverable.
- Router.ForcedArm() == "" — when the user pinned an arm via --provider,
failover is disabled by design.
- failoverAttempt < maxFailovers — bounded retry budget.
TUI renders EventFailover under the existing "cost" role styling.
shortFailReason strips the subprocess wrapper envelope so the user sees
"Invalid API key. Try again." instead of
"subprocess: exit status 1: Error: Invalid API key. Try again.".
Tests cover the classifier (isFailoverable, shortFailReason), end-to-end
auth-error failover, content-already-streamed guard, and context-cancel
guard. Deterministic across 10x -race runs by giving the failing arm
IsCLIAgent=true to anchor it in tier 0 ahead of the API-tier backup.
Today an arm's stream error (auth, rate limit, subprocess exit) kills
the turn. Backlog item to retry on the next-best arm for the task type
and surface a one-line hint to the user.
Waves 1-3 + ADR-004 are all merged; the 2026-05-19 external audit's
14 findings are closed. TODO.md no longer needs to track the in-
progress wave or scoped-but-not-drafted waves — they're all done.
Advisor flagged that engine.Config.Provider stayed raw, so the safety
property was 'every call goes through buildRequest' instead of the
stronger 'every Stream call routes through a SafeProvider.' Wrap it
even though buildRequest still scans inline — at worst this costs one
extra idempotent scan pass; it removes the 'someone adds a fifth engine
Stream site that skips buildRequest' failure mode.
Engine.SetProvider gets a doc comment establishing the wrap contract
for callers. No active callers today, but documenting it now prevents
the future bypass.
Confirmed elf engines inherit the wrap automatically:
- elf.Manager.Spawn passes arm.Provider (already *SafeProvider after
W1-3a)
- elf.Manager.SpawnWithProvider has no callers — dead code path
Added the Wave 1 plan to TODO.md under active plans.
New dated plan at docs/superpowers/plans/2026-05-19-post-slm-unlock.md
covers the work surfaced during this session that hasn't shipped yet:
Phase A — two-stage tool routing (last item from the original
smallcode audit; gates on local + small-context arms; saves ~70% of
schema tokens per request).
Phase B — CLI agent binary override. [cli_agents] config section lets
users map canonical agent names (claude / gemini / vibe) onto local
aliases (claude-priv, gemini-work, etc.).
Phase C — user profiles. Multiple named configs (work / private /
experiment) layered over a base config.toml, switchable via
--profile flag, [config].default_profile, and a /profile TUI command.
Phase D — per-arm capability tags (Phase-4 prep). Per-arm Strengths
[]TaskType and CostWeight to make the router actually pick Opus over
Gemini for Planning/SecurityReview etc., not just for cost reasons.
Phase E — compound tools (deferred until SLM-arm telemetry shows
which chain patterns fail).
Plus an explicit drop list of things we considered and won't ship.
TODO.md updated to point at the new plan and note that the original
roadmap's Phase 4 is now superseded.
provider/openai:
- Fix doubled tool call args (argsComplete flag): Ollama sends complete
args in the first streaming chunk then repeats them as delta, causing
doubled JSON and 400 errors in elfs
- Handle fs: prefix (gemma4 uses fs:grep instead of fs.grep)
- Add Reasoning field support for Ollama thinking output
cmd/gnoma:
- Early TTY detection so logger is created with correct destination
before any component gets a reference to it (fixes slog WARN bleed
into TUI textarea)
permission:
- Exempt spawn_elfs and agent tools from safety scanner: elf prompt
text may legitimately mention .env/.ssh/credentials patterns and
should not be blocked
tui/app:
- /init retry chain: no-tool-calls → spawn_elfs nudge → write nudge
(ask for plain text output) → TUI fallback write from streamBuf
- looksLikeAgentsMD + extractMarkdownDoc: validate and clean fallback
content before writing (reject refusals, strip narrative preambles)
- Collapse thinking output to 3 lines; ctrl+o to expand (live stream
and committed messages)
- Stream-level filter for model pseudo-tool-call blocks: suppresses
<<tool_code>>...</tool_code>> and <<function_call>>...<tool_call|>
from entering streamBuf across chunk boundaries
- sanitizeAssistantText regex covers both block formats
- Reset streamFilterClose at every turn start