Closes R-4 and R-5 of the routing-defaults plan.
R-4: Strengths + CostWeight defaults for closed frontier models.
Cloud entries land in the same knownFamilyDefaults table as local
ones, with MaxComplexity intentionally left zero (cloud arms get
no complexity ceiling). CostWeight tuned per the plan's rationale:
claude-opus-4-7 → Planning/SecurityReview/Debug/Refactor, 0.3
claude-sonnet-4-6 → Generation/Refactor/Review, 0.7
gpt-5.5 → Planning/SecurityReview/Generation, 0.3
gpt-5.3-codex → Generation/Refactor/Debug/UnitTest, 0.6
gpt-5.2 → Orchestration/Review, 0.8
gemini-3.1-pro → Planning/Review/Orchestration, 0.5
gemini-3.5-flash → Boilerplate/Explain/Orchestration, 1.2
The 0.3 weight on frontier arms keeps them competitive on
SecurityReview / Planning despite $4+/Mtok; 1.2 on Gemini Flash
penalizes cost more so it only wins when cost is genuinely
decisive (boilerplate, explain).
Mechanism: extracted applyFamilyDefaults into defaults.go and call
it from Router.RegisterArm. Single source of truth — both local
discovery and the primary-provider path in cmd/gnoma/main.go now
flow through the same defaults application. Removed the duplicate
apply block from RegisterDiscoveredModels.
Legacy model IDs (claude-opus-4-20250514, gpt-4o, o3, gemini-2.5-pro,
etc.) intentionally do not match any table entry — keeps users on
pinned older models safe from imposed 2026 Strengths.
R-5: gpt-5.3-codex registration.
- internal/provider/openai/provider.go: added to fallbackModels
and inferOpenAIModelCapabilities (400K context, 32K output).
- internal/provider/ratelimits.go: gpt-5.3-codex and its dated
alias gpt-5.3-codex-2026-02-15 added with the same Tier 1
quotas as gpt-5.2.
Gemini 3.x (3.1-pro-preview, 3.5-flash, 3.1-flash-lite) was already
registered in both google/provider.go and ratelimits.go — no change
needed for that part of R-5.
Test coverage:
- ResolveFamilyDefaults table-driven across all 7 cloud entries
including prefix-sharing (gpt-5.5-pro → gpt-5.5 defaults,
gemini-3.1-pro-preview → gemini-3.1-pro defaults).
- Legacy IDs return !ok.
- RegisterArm applies cloud defaults end-to-end.
- User-supplied Strengths and CostWeight are not overridden.
- ID.Model() fallback works when ModelName is empty (test code
often constructs arms this way).
Refs: docs/superpowers/plans/2026-05-23-routing-defaults-refresh.md
Bump hard-coded provider defaults to the May 2026 lineup:
- Anthropic: claude-sonnet-4-6 (default); Opus 4.7 and Haiku 4.5 in
the fallback list. 4.6/4.7 generation has 1M context standard.
- OpenAI: gpt-5.5 (default); 5.5-pro / 5.2 / 5.2-chat-latest in
fallback. ThinkingModes now baseline on GPT-5.x.
- Google: gemini-3.5-flash (default); 3.1 Pro / Flash Lite in fallback.
- Mistral: mistral-large-latest unchanged (Mistral Large 3); add
mistral-medium-3.5, mistral-medium-2511, mistral-large-2512 to the
rate-limit map.
Legacy dated IDs retained in fallback lists and ratelimits maps so
configs pinned to claude-sonnet-4-20250514 / gpt-4o / gemini-2.5-flash
keep resolving. Capability tables (ContextWindow, MaxOutput,
ThinkingModes) updated to match each generation. CLI help text in
cmd/gnoma/main.go also updated.
provider/openai:
- Fix doubled tool call args (argsComplete flag): Ollama sends complete
args in the first streaming chunk then repeats them as delta, causing
doubled JSON and 400 errors in elfs
- Handle fs: prefix (gemma4 uses fs:grep instead of fs.grep)
- Add Reasoning field support for Ollama thinking output
cmd/gnoma:
- Early TTY detection so logger is created with correct destination
before any component gets a reference to it (fixes slog WARN bleed
into TUI textarea)
permission:
- Exempt spawn_elfs and agent tools from safety scanner: elf prompt
text may legitimately mention .env/.ssh/credentials patterns and
should not be blocked
tui/app:
- /init retry chain: no-tool-calls → spawn_elfs nudge → write nudge
(ask for plain text output) → TUI fallback write from streamBuf
- looksLikeAgentsMD + extractMarkdownDoc: validate and clean fallback
content before writing (reject refusals, strip narrative preambles)
- Collapse thinking output to 3 lines; ctrl+o to expand (live stream
and committed messages)
- Stream-level filter for model pseudo-tool-call blocks: suppresses
<<tool_code>>...</tool_code>> and <<function_call>>...<tool_call|>
from entering streamBuf across chunk boundaries
- sanitizeAssistantText regex covers both block formats
- Reset streamFilterClose at every turn start