Three coupled fixes that surfaced from a single FunctionGemma test
session where the SLM-as-execution-arm assumption broke down and
every subsequent prompt failed with 'session not idle (state: error)'.
(A) [slm].register_as_arm config. The SLM has always been
unconditionally registered as both classifier AND tier-0 execution
arm. Fine for general-purpose models (ministral, qwen3-chat); breaks
for task-specialised models (FunctionGemma emits function-call
syntax instead of prose; embedding models can't generate). New
pointer-bool config: nil/absent preserves the historical default
(true), explicit false makes the SLM classifier-only and the
execution path skips the slm/* arm. Three table tests cover absent
/ explicit-false / explicit-true decode paths.
(B) Session error recovery. After any routing or engine error, the
session moved to StateError and stayed there until restart — every
new user prompt got rejected with 'session not idle (state: error)'.
ResetError() was already wired for the /init retry path, but the
general user-input and slash-command paths didn't call it. Added
ResetError() before every user-initiated Send in the TUI so a fresh
prompt always represents intent-to-retry. The /init internal retry
already had its own ResetError; left alone.
(C) filterFeasible per-arm rejection logging. Today's 'no feasible
arm for task X' error tells you THAT every arm was rejected but
nothing about WHY. Added slog.Debug per rejection (arm, task,
complexity, reason, the specific violated constraint) plus a
summary line when zero arms are feasible at any quality. Visible
with --verbose; quiet otherwise. Surface area expansion only — no
behaviour change for users not chasing a bug.
Five fixes folded into one commit because they all answer the same
question: 'why does my router stats output lie to me?'
Issue 1 (timeout). Default classify timeout was 5s — too short for
cold-start ollama loads on small models. Bumped to 15s and surfaced
as [slm].classify_timeout (0 = built-in default). Empirically caught
when a user's reecdev/tiny3.5:1.5b hit 'stream error: context
deadline exceeded' on every single classify call.
Issue 2 (Warn-level error). The SLM-fallback path logged the
underlying error at Debug, invisible without --verbose. Promoted to
Warn so a first-time misconfiguration surfaces immediately. The
fallback itself is benign; the signal is that the SLM isn't doing
the work it was supposed to.
Issue 3 (stats hint). Hard-coded 'check that llamafile boots' even
when the user is on ollama. Replaced with backend-templated advice
read from cfg.SLM.Backend. Also distinguishes three diagnostic
cases that were collapsed before:
- SLM never called (zero attempts)
- SLM called N times but every call fell back (timeout/parse)
- SLM working but minority share
Issue 4 (effective heuristic share). The classifier breakdown
shows 'heuristic' and 'slm_fallback' as separate sources, but both
routed through HeuristicClassifier — only the source tag differs.
New line under 'total observations' surfaces the combined share
honestly: 'effective heuristic share: 100% (44 fallbacks + 10
pure heuristic)'.
Issue 5 (config schema). [slm].classify_timeout joins the existing
[slm] knobs alongside startup_timeout. Documented inline with the
cold-start-load rationale.
Four hardcoded constants in the selector and feedback tracker are now
user-tunable via [router.bandit]:
- quality_alpha (EMA smoothing, default 0.3)
- min_observations (samples before observed overrides heuristic, default 3)
- observed_weight (observed/heuristic blend ratio, default 0.7)
- strength_bonus (quality bonus for Strengths-tagged arms, default 0.15)
Each field treats 0 as 'use default', so an empty TOML block is
byte-identical to pre-config behaviour. BanditParams is plumbed via
router.Config{Bandit: ...} and resolveBanditParams() centralises the
fallback so every call site shares the same defaults.
QualityTracker, scoreArm, bestScored, and selectBest signatures now
take the configured values directly rather than reaching for package-
level constants. Tests updated to pass BanditParams{} (defaults) or
explicit overrides where they validate the new tuning paths.
Tracks item #3 from the 'Bandit selector — design decisions deferred'
TODO entry — ships independently of the EMA vs SLM strategic decision.
Implements S-1 through S-7 of the startup-safety-banner plan.
Adds a pre-launch safety check that classifies the current working
directory into three tiers and gates the launch:
TierRefuse /, /etc, /sys, /proc, /usr, /var, /bin, /sbin, /boot,
/root, /dev (Linux) and /System, /Library, /private,
/Applications (macOS). Refuses with exit 2 unless
--dangerously-allow-anywhere is passed.
TierWarn $HOME, ~/Desktop, ~/Downloads, ~/Documents, ~/.config,
~/.local, ~/.cache, /tmp, and similar dumping grounds.
Prints a banner and reads a single y/Y from stdin to
confirm; any other input (or EOF, including piped/
scripted invocation) aborts with exit 1.
TierOK Anywhere with a recognized project marker (.gnoma/,
go.mod, package.json, pyproject.toml, Cargo.toml,
Makefile, Dockerfile, build.gradle*, pom.xml) or
inside a git repo. No prompt; banner only.
Project markers and git-repo presence override the TierWarn check —
a project dir inside $HOME stays TierOK. The require_project_marker
config knob can flip that for strict users.
Container detection: when /.dockerenv or /run/.containerenv exists,
TierRefuse downgrades to TierWarn (devcontainers often chroot to /
or similar). Best-effort; false positives only soften the gate.
The context banner is always rendered (TierOK, TierWarn, TierRefuse
alike) and summarizes: cwd, git branch + dirty state, project type,
provider/model, modes (permission, incognito, prefer), and a
top-level sensitive-file inventory. Inventory matches .env,
.env.*, env.local; private-key extensions (.pem, .key, .crt, .p12,
.pfx); SSH key names (id_rsa, id_ed25519, ...); credentials files;
.netrc / .pgpass; KeePass vaults; and .ssh/ .aws/ .kube/ .gcloud/
.azure/ .docker/ directories. Precision-tested: .envrc and
secret_handler.go do NOT match. Bounded at 1000 entries.
Architecture:
- internal/safety/cwd.go — Classification + symlink-resolving tier
classifier with platform-specific roots and container detection.
- internal/safety/sensitive.go — pattern-based top-level scanner,
deterministic ordering, scanLimit guard against pathological dirs.
- internal/safety/banner.go — pure render functions for the warn
prefix, refuse message, and context banner. Safe for golden-string
testing.
- internal/config/config.go — new [safety] section with three
config keys, defaults applied via ResolvedSafety() helper. Pointer
fields distinguish "user omitted" from "user set to false."
- cmd/gnoma/main.go — gate runs after subcommand dispatch (so
`gnoma providers / profile / slm / router` skip the prompt) and
before provider creation. --dangerously-allow-anywhere bypasses
the gate with an explicit log warning.
The runtime keypress reads up to 8 bytes from os.Stdin and accepts
only "y" / "Y" trimmed; EOF returns false (piped invocations
without the flag will abort). Documented in the readYesConfirmation
helper. Manual smoke (per plan):
- `cd / && gnoma -p test` → refuses
- `cd ~ && gnoma` → warns + keypress
- `cd ~/git/some-repo && gnoma` → banner only
- subcommands skip the gate entirely
Linux + macOS classification; Windows path handling deferred per
plan (treated as TierOK there until follow-up).
Refs: docs/superpowers/plans/2026-05-23-startup-safety-banner.md
Implements P-1 through P-6 of the prefer-routing-policy plan.
Adds a config knob that biases routing toward local arms, cloud
arms, or leaves selection unchanged. Default "auto" is
byte-identical to pre-change behavior (the new armTier path with
PreferAuto returns the same value as the old single-arg function).
Mechanism diverged from the plan after empirical testing:
The plan called for a score multiplier applied in bestScored.
Tests revealed the existing cost-floor math (scoreArm divides by
weighted cost which collapses to ~0.001 for free local arms) gives
local arms a ~280x raw-score advantage that a 0.3-0.5 multiplier
can't overcome. A tier-shift in armTier turned out cleaner:
PreferLocal: cloud arms (true API, IsLocal=false && !IsCLIAgent)
get +2 tier shift, landing behind locals.
PreferCloud: IsLocal arms get +2 tier shift, landing behind
cloud. SLM tier-0 arms shift to tier 2 — still
below cloud's tier 3 — so the SLM-protection
semantic (small stuff stays on the small model)
survives PreferCloud. This matches the open
question in the plan, now resolved as: yes, SLMs
keep winning under PreferCloud by design.
The policyMultiplier was kept in bestScored as a within-tier
nudge (mostly cosmetic in practice given the cost-floor dynamics
described above; could matter when costs are calibrated). Worth
revisiting once router-wide cost calibration lands.
Strengths cross-tier promotion is unaffected: the promoted-set
path in selectBest bypasses armTier entirely, so a strongly-tagged
cloud arm still wins SecurityReview tasks under PreferLocal
(validated by TestPreferPolicy_StrengthsBeatsMultiplier).
CLI-agent subprocess arms count as "local" for PreferLocal
purposes — they proxy to cloud but the user-visible behavior is
local. Users who want to exclude them can use --provider X.
Forced arms (--provider X) and incognito take priority over the
policy: forced arm test pins this, incognito-still-wins test pins
the LocalOnly hard filter dominating PreferCloud.
Test coverage (prefer_test.go): ParsePreferPolicy / String round
trips; policyMultiplier table; acceptance scenarios across all
three policies with adjacent-tier arms; SLM-still-wins under
PreferCloud; Strengths beats multiplier; forced-arm bypass;
incognito beats prefer; lone cloud arm wins when no local feasible.
Refs: docs/superpowers/plans/2026-05-23-prefer-routing-policy.md
Add a deterministic pre-extractor that skips known-safe token shapes
before they reach the entropy scorer. Targets the false-positive
regime that bites under lowered entropy_threshold or
redact_high_entropy = true — UUIDs (~3.4 bits), SHA hex digests
(~3.9 bits), ISO-8601 timestamps, and HTTP(S) URLs.
Config knob lives under the existing security section to match
entropy_threshold / redact_high_entropy convention:
[security]
entropy_safelist = ["uuid", "sha_hex", "iso8601", "url"]
Empty / unset preserves pre-F-1 behaviour exactly — users opt in.
Per-pattern Debug telemetry fires on every skip (pattern name +
token length, never the token bytes). This is the data F-2's
go/no-go gate depends on; the plan literally specifies it.
NewFirewall validates names at the config boundary and emits a
Warn for unknown entries so a typo like "uid" instead of "uuid"
surfaces loudly instead of silently disabling FP reduction.
Tests cover: UUID/SHA-1/SHA-256 skipped at lowered threshold,
mixed payload (safe shape + real secret) preserves the secret,
secret-adjacent-to-UUID regression guard, empty safelist preserves
pre-F-1 behaviour, unknown name silently dropped at scanner level
but warned at firewall level, end-to-end FirewallConfig wiring,
and the skip-telemetry log line.
F-2 remains gated on real-workload FP-rate observations.
Bundles the pending TUI work into a coherent batch. Bug fixes from
external review:
* expandPlaceholders: single-pass alternation regex over the original
input prevents `#p\d+` / `#img\d+` tokens inside pasted content from
being re-expanded after the bracket form is inlined.
* /incognito: gate savePromptHistory and the Ctrl+V image-write branch
on `!m.incognito` so the no-persistence contract holds.
* history.txt: write at mode 0600 (chmod existing 0644 files), create
parent dir at 0700, truncate to 500 entries on every save, slog.Warn
on errors instead of swallowing.
* triggerPickerAction: guard m.config.Engine before SetModel, matching
the /model handler.
* Picker key handler: navigation/enter/q consume, escape/ctrl+c close
the picker AND fall through to global handlers (so streaming cancel
and double-tap quit work with an overlay open), default swallows
stray input.
* Paste line count: report total non-empty lines instead of newline
count, ignoring trailing newlines (no more "+0 lines" for "abc").
* Ctrl+O restored to expand-output; Ctrl+Y is the new copy-response
bind. /keys help text updated; picker help entries reordered.
* Tighter perms on .gnoma/pasted_image_*.png (0600).
Race-safety refactor: ApplyTheme used to mutate ~25 package-level
lipgloss styles in place. Replaced with an immutable themeStyles
snapshot and atomic.Pointer[themeStyles] swap. Readers go through a
theme() helper (one atomic load) instead of touching package vars
directly. No locks, no nested-RLock risk if rendering ever moves
off-thread.
Includes pre-existing in-flight work: TUISection in config with
persistent theme/vim settings; /copy /theme /vim slash commands;
provider-name completion; session.SetProvider for the provider picker.
Tests: placeholder_test.go (6 regression + happy-path cases including
the pasted-content collision), history_test.go (5 cases covering perms
on new and existing files, on-disk truncation, blank-input, newline
flattening), provider_test.go (provider switching + picker transitions
+ SLM gating).
Apply gofmt -w across the codebase (struct field comment realignment
only — no semantic changes) and silence two errcheck warnings on
fmt.Sscanf / fmt.Fprintf return values in internal/router/discovery
with explicit `_, _ =` discards. Required so `make check` is green
before tagging v0.1.0.
Adds opt-in user profiles for swapping API keys, CLI binaries, and
permission modes between contexts (work/private/experiment/...).
Profile mode engages only when ~/.config/gnoma/profiles/ exists, so
existing single-config installations are untouched. Selection order:
--profile flag → default_profile in base config → fatal error.
Layering: defaults → ~/.config/gnoma/config.toml → profiles/<name>.toml
→ <projectRoot>/.gnoma/config.toml → env. Map sections merge per-key;
[[arms]] and [[mcp_servers]] merge by id/name; [[hooks]] appends.
Per-profile data: quality-<name>.json and sessions/<name>/ keep the
bandit and session list from cross-contaminating between profiles.
Profile names restricted to [A-Za-z0-9_-] to block --profile=../foo
path traversal into derived paths.
Plan D from docs/superpowers/plans/2026-05-19-post-slm-unlock.md
(static portion; dynamic bandit-driven promotion deferred to D-2).
Routing previously let tier ordering (CLI > local > API) dominate
selection — Opus, in tier 3, would lose to a tier-1 CLI agent for
SecurityReview even though Opus is empirically stronger at that task.
This change introduces explicit per-arm overrides:
[[arms]]
id = "anthropic/claude-opus-4-7"
strengths = ["security_review", "planning"]
cost_weight = 0.3
Strengths gate cross-tier promotion: arms matching task.Type bypass
the tier loop and compete with each other directly. Promotion is a
preference, not a pin — if no strength-tagged arm is feasible
(backoff, pool capacity, tool support), selection falls through to
the default tier order.
CostWeight linearly dampens the cost penalty in scoreArm via
effectiveCost = 1 + CostWeight * (cost - 1)
CostWeight=1.0 (or unset) preserves current behavior; lower values
trade cheapness for quality. The earlier draft used cost^CostWeight
which inverts direction for sub-1 local-arm costs (raising a
fraction <1 to a fractional power makes it bigger, not smaller); a
monotonicity regression test prevents that drift.
- internal/router/arm.go: Strengths []TaskType, CostWeight float64,
HasStrength(), ResolvedCostWeight() (zero → 1.0).
- internal/router/selector.go: scoreArm strength bonus const
(strengthScoreBonus = 0.15) + linear cost dampening; selectBest
cross-tier promotion before tier loop.
- internal/router/router.go: ArmOverride type + ApplyArmOverrides()
returns unknown IDs; unknown strength names skipped with per-name
warning via slog.
- internal/router/task.go: ParseTaskTypeStrict() returns ok bool;
ParseTaskType now delegates so the two switches stay in sync.
- internal/config/config.go: ArmConfig + [[arms]] TOML wiring.
- cmd/gnoma/main.go: applies overrides after all initial arms
register; logs a warning when an [[arms]] id has no matching
registered arm.
Tests cover: predicate helpers, scoring direction across two arms,
linear-formula monotonicity on both sides of cost=1, cross-tier
promotion, empty-Strengths preserves tier order, promoted arm in
backoff falls through via full Router.Select path, observed-quality
tiebreak between two strength-tagged arms, ApplyArmOverrides happy
path + unknown-ID reporting + unknown-strength skipping.
Plan B from docs/superpowers/plans/2026-05-19-post-slm-unlock.md.
Users with aliased CLI binaries (claude-priv, claude-work,
gemini-personal) can now point gnoma's auto-discovery at them
without renaming. The override flows through to the actual subprocess
spawn at internal/provider/subprocess/provider.go:56, so routing
through the alias is functional, not cosmetic.
Config:
[cli_agents]
claude = "claude-priv" # discovery uses claude-priv instead of claude
gemini = "" # empty value = no override (fall back to canonical)
# vibe is absent = canonical name used
- internal/config/config.go: CLIAgentsSection map[string]string;
TOML [cli_agents] key.
- internal/provider/subprocess/agent.go:
- Package-level lookPath = exec.LookPath for test injection.
- resolveAgentBinary(canonical, override) → (path, binName, err).
Override='' falls back to canonical. Override set but missing from
PATH returns an error (no silent fallback — masks user typos).
- DiscoveredAgent.OverrideBinary records the override binary name
when one was used; empty otherwise.
- DiscoverCLIAgents(ctx, overrides) signature; warning logged when
an override is configured but the binary isn't on PATH.
- cmd/gnoma/main.go: both call sites pass cfg.CLIAgents. The
`gnoma providers` listing renders `claude-priv (via [cli_agents].claude)`
when an override is in effect.
Tests cover: 5 resolver cases (no override, override set, empty
override falls back, override missing, canonical missing); 4
discovery cases (no overrides, override resolves alias, empty value
falls back, override missing skips agent); 2 config round-trip cases.
Plan A from docs/superpowers/plans/2026-05-19-post-slm-unlock.md.
Small local SLMs (<=16k context) waste ~1500 tokens per turn on the
full tool catalogue. Two-stage routing replaces round-1 tools with a
single synthetic select_category schema; round-2+ sends only the
selected category's real tool schemas plus select_category for
re-selection.
- internal/tool/category.go: Category type, optional Categorized
interface, CategoryOf() with meta fallback. fs.read/fs.ls -> read,
fs.write/fs.edit -> write, fs.glob/fs.grep -> search, bash -> exec.
- internal/engine/twostage.go: synthetic select_category tool,
intercept helper, per-turn selectedCategory state under e.mu.
- Engine round 1 forces ToolChoiceRequired so SLMs don't fall back to
prose. State resets at the top and end of every runLoop.
- Activates automatically on a forced local arm with ContextWindow
<=16384, or via [router].force_two_stage TOML key.
- Integration test drives a 3-round trip and asserts: round 1 emits
exactly one schema (synthetic) with ToolChoiceRequired, round 2
contains only write-category schemas + select_category, real
fs.write executes. Invalid-category fallback round-trips back to
round-1 mode.
The SLM had two intended jobs — classify every prompt and execute the
small ones itself — but in practice three independent gates kept it
out of nearly all real work:
1. llamafile cold-start blocked pipe-mode runs (always faster than
the 15 s health check)
2. ClassifyTask defaulted RequiresTools=true, excluding the SLM arm
(ToolUse=false) from 9/10 task types
3. armTier hard-coded CLI agents > local > API, so even when the SLM
arm was feasible a CLI agent won
Each gate is addressed below. The result is an SLM that actually does
its job — small stuff stays local, complex stuff routes up — gated by
arm capability rather than by accidents of the boot order.
Backend layer (the bigger change)
The original implementation hard-coded llamafile. That's fine if you
have nothing else, but most users with a local model setup already run
Ollama or llama.cpp. The new factory at internal/slm/backend.go picks
between:
- ollama (any local Ollama daemon)
- llamacpp (any llama.cpp server)
- llamafile (gnoma-managed, current behaviour)
- openaicompat (LM Studio, vLLM, remote API)
- auto (probes in order, picks first reachable)
- disabled
[slm].backend in config.toml selects which. Documented in
docs/slm-backends.md with copy-paste presets for each. The factory
probes the underlying model's actual capabilities (Ollama /api/show,
llama.cpp /props) and sets the SLM arm's ToolUse accordingly — so the
arm picks up simple file-read style tasks on tool-capable models and
stays knowledge-only on completion-only models.
Trivial-prompt heuristic (Gate 2)
ClassifyTask now flips RequiresTools=false for short, low-complexity
prompts whose task type doesn't imply existing code (Explain,
Generation, Boilerplate). Tool-needing tokens (read, write, run, test,
file, …) keep RequiresTools=true even when the prompt is brief.
Complexity-aware tier ordering (Gate 3)
armTier takes a Task and returns tier 0 for arms whose MaxComplexity
ceiling fits the task. CLI agents drop to tier 1, local to 2, API to 3.
For trivial tasks the SLM arm wins; for complex tasks the SLM falls
out of the feasible set (MaxComplexity exclusion) and the original
ordering reasserts.
Eager boot with user-facing wait (Gate 1)
Removed the original goroutine-only path. SLM startup now blocks
synchronously inside the factory; for llamafile that means up to
[slm].startup_timeout (default 5 s) of waiting on the first
invocation, with "Starting SLM…" → "SLM ready (backend, model, tools,
boot=N)" / "SLM unavailable: …" messages on stderr. Ollama / llamacpp
backends boot instantly because the daemon is already running.
waitHealthy() now respects the caller's context deadline instead of
its old hardcoded 15 s ceiling.
Classifier reliability
Classifier timeout bumped 2 s → 5 s for thinking-mode models like
Qwen3-distilled Tiny3.5. System prompt includes /no_think directive
for the same family. These help but don't eliminate small-model
JSON-contract failures — see the docs section on picking a model.
Probe + telemetry surfaces
gnoma slm status now prints the configured backend + model + a live
probe result (✓/✗) instead of just the llamafile manifest state.
`gnoma router stats` already (from the previous commit) shows the
classifier-source mix; with this change you can finally see slm /
slm_fallback / heuristic share rise from "always heuristic" to
something reflecting real SLM activity.
Tests
- 9 new backend-factory tests (httptest-backed Ollama probe, error
paths, auto-detection, capability flags)
- Tier-ordering tests cover the new "specialised small arm wins
trivial task" path
- Trivial-prompt heuristic tested for both halves (knowledge-only
flips RequiresTools=false; debug/file/run keeps it true)
Deletes the dead SLMManager field from the TUI Config — it was
declared but never read.
- slm.Classifier: openaicompat → llamafile, 2s timeout + heuristic fallback,
heuristic baseline blended so Priority/RequiredEffort are never zeroed,
extractJSON strips markdown fences from small-model responses
- router.ParseTaskType: case-insensitive string → TaskType, unknown → TaskGeneration
- router.Arm.MaxComplexity: zero = no ceiling (preserves existing arm behavior);
filterFeasible excludes arms when task.ComplexityScore > MaxComplexity
- config.SLMSection: [slm] enabled / model_url / data_dir
- openaicompat.NewLlamafile: no API key, model = "default", no retries
- slm.Manager: DefaultDataDir() (XDG), Manifest() accessor
- cmd/gnoma: `gnoma slm setup` / `gnoma slm status` subcommands; SLM arm
registered with MaxComplexity=0.3 when enabled + set up
- tui: /config shows slm status (ready/missing/not set up + base URL if running)
- docs: roadmap updated to reflect llamafile pivot from Ollama
Complete the remaining M8 extensibility deliverables:
- MCP client with JSON-RPC 2.0 over stdio transport, protocol
lifecycle (initialize/tools-list/tools-call), and process group
management for clean shutdown
- MCP tool adapter implementing tool.Tool with mcp__{server}__{tool}
naming convention and replace_default for swapping built-in tools
- MCP manager for multi-server orchestration with parallel startup,
tool discovery, and registry integration
- Plugin system with plugin.json manifest (name/version/capabilities),
directory-based discovery (global + project scopes with precedence),
loader that merges skills/hooks/MCP configs into existing registries,
and install/uninstall/list lifecycle manager
- Config additions: MCPServerConfig, PluginsSection with opt-in/opt-out
enabled/disabled resolution
- TUI /plugins command for listing installed plugins
- 54 tests across internal/mcp and internal/plugin packages
provider/openai:
- Fix doubled tool call args (argsComplete flag): Ollama sends complete
args in the first streaming chunk then repeats them as delta, causing
doubled JSON and 400 errors in elfs
- Handle fs: prefix (gemma4 uses fs:grep instead of fs.grep)
- Add Reasoning field support for Ollama thinking output
cmd/gnoma:
- Early TTY detection so logger is created with correct destination
before any component gets a reference to it (fixes slog WARN bleed
into TUI textarea)
permission:
- Exempt spawn_elfs and agent tools from safety scanner: elf prompt
text may legitimately mention .env/.ssh/credentials patterns and
should not be blocked
tui/app:
- /init retry chain: no-tool-calls → spawn_elfs nudge → write nudge
(ask for plain text output) → TUI fallback write from streamBuf
- looksLikeAgentsMD + extractMarkdownDoc: validate and clean fallback
content before writing (reject refusals, strip narrative preambles)
- Collapse thinking output to 3 lines; ctrl+o to expand (live stream
and committed messages)
- Stream-level filter for model pseudo-tool-call blocks: suppresses
<<tool_code>>...</tool_code>> and <<function_call>>...<tool_call|>
from entering streamBuf across chunk boundaries
- sanitizeAssistantText regex covers both block formats
- Reset streamFilterClose at every turn start