Files

T

vikingowl 5569d4fb86 docs: consolidated roadmap, ADR-013, drop stale plans

- New 7-phase roadmap (2026-05-07-gnoma-roadmap.md) covering M8 cleanup,
  PTY interactive shell, SLM classifier, router revisit, USP security,
  ELF support, and distribution
- ADR-013 (002-slm-routing.md): SLM-first routing supersedes ADR-009;
  Thompson Sampling deferred pending SLM production data
- ADR-009 status updated to "Superseded by ADR-013"
- gemma-integration-analysis.md: header note that Node.js specifics
  (LiteRT-LM, daemon, PID) don't apply to gnoma's Go implementation
- TODO.md replaced with thin pointer to roadmap + stable backlog
- Deleted stale plan/spec files: m6-m7-closeout, m8-hooks-design

2026-05-07 15:06:54 +02:00

3.5 KiB

Raw Permalink Blame History

ADR-013: SLM-First Routing (Supersedes ADR-009)

Status: Accepted Date: 2026-05-07 Supersedes: ADR-009 (Thompson Sampling for Multi-Armed Bandit)

Context

ADR-009 committed to Discounted Thompson Sampling as the primary routing intelligence for M9. That decision was made before SLM-driven task classification was on the roadmap.

With an SLM classifier (Phase 3 of the 2026-05-07 roadmap) as the preflight dispatcher, the routing stack changes fundamentally:

The SLM has richer context than a Beta(α, β) distribution: task semantics, conversation history, model performance history, and cost envelope are all available at classification time.
A bandit running in parallel with an SLM introduces competing signals — the bandit's numeric EMA score may contradict the SLM's intent classification without a clear resolution rule.
The current heuristic tier order (CLI > local > API) in armTier() is a pragmatic placeholder that the SLM can supersede with semantically-grounded decisions.

Decision

SLM classifier first. Implement a TaskClassifier interface with two implementations: HeuristicClassifier (default, wraps existing ClassifyTask()) and SLMClassifier (Ollama HTTP via the existing openaicompat provider, opt-in via router.slm_model config key).
Defer Thompson Sampling. ADR-009's Thompson Sampling plan is deferred. It will be re-evaluated after the SLM classifier has been in production and generating real signals. The decision at that point will be one of:
- Keep bandit as a feedback loop the SLM reads (outcome telemetry → SLM context)
- Retire numeric EMA in favour of qualitative outcome summaries fed to the SLM
- Keep both with explicit, non-overlapping responsibilities
Go implementation constraint. The SLM runtime must respect CGO_ENABLED=0. LiteRT-LM, daemon processes, and CGO bindings are out of scope. Ollama HTTP is the only supported runtime — it is an opt-in user dependency, not a bundled one.

Alternatives Considered

Alternative A: Implement ADR-009 as planned, add SLM later

Pros: Incremental; Thompson Sampling is already designed
Cons: Two competing learning signals from day one. Unclear which wins on conflict. Wasted implementation effort if Thompson Sampling is retired post-SLM.

Alternative B: Full SLM takeover — retire bandit entirely

Pros: Single source of routing truth
Cons: Premature. SLM quality is unknown before production data. Bandit may still add value for cost/quality feedback within a tier where the SLM doesn't have preferences.

Alternative C: This ADR — SLM first, bandit deferred and re-evaluated

Pros: No competing signals. Real data informs the bandit re-evaluation. Implementation effort is not wasted on a system that may be retired.
Cons: Thompson Sampling (ADR-009) ships later than originally planned.

Consequences

Positive:

Clean routing signal: one classifier, no conflicts
SLM produces semantic task types that heuristic scoring cannot match
Zero new runtime deps for users who don't have Ollama (heuristic fallback always active)
CGO_ENABLED=0 constraint preserved

Negative:

ADR-009 (Thompson Sampling) ships later, or not at all
Users without Ollama get heuristic routing only (same as today)
SLM quality depends on the model available in the user's Ollama installation

Neutral:

QualityTracker / EMA in feedback.go remains as infrastructure; no behavior change until Phase 4 re-evaluation decides its fate

3.5 KiB Raw Permalink Blame History Unescape Escape