Files

vikingowl d3990214a5 docs: update essentials for router, security, task learning

Restructure milestones from M1-M11 to M1-M15:
- M3: Security Firewall (secret scanner, incognito mode)
- M4: Router Foundation (arm registry, pools, task classifier)
- M5: TUI with full 6 permission modes
- M6: Full compaction (truncate + LLM summarization)
- M9: Router Advanced (bandit learning, ensemble strategies)
- M11: Task Learning (pattern detection, persistent tasks)

Add ADR-007 through ADR-012 for security-as-core, router split,
Thompson Sampling, MCP replaceability, task learning, incognito.

Add risks R-010 through R-015 for router, security, feedback,
task learning, ensemble quality, shell parser.

Update architecture dependency graph with security, router,
elf, hook, skill, mcp, plugin, tasklearn packages.

Update domain model with Router, Arm, LimitPool, Firewall entities.

2026-04-03 10:47:11 +02:00

3.5 KiB

Raw Permalink Blame History

essential, status, last_updated, project, depends_on

essential

status

last_updated

project

depends_on

risks

complete

2026-04-02

gnoma

Risk / Unknowns

ID	Risk	Severity	Mitigation	Status
R-001	SDK breaking changes — provider SDKs are pre-1.0 and may change APIs	Medium	Pin versions, integration tests per provider, adapter layer absorbs changes	Open
R-002	Google range-to-pull bridge goroutine leak — context cancellation edge cases	Medium	Thorough testing with `testing/synctest`, always select on `ctx.Done()`	Open
R-003	Thinking block round-trip fidelity — Anthropic signatures must survive serialization	Medium	Unit tests with real signature values, golden file tests	Open
R-004	Tool call ID generation inconsistency — Google/Ollama may return empty IDs	Low	Generate UUID if provider returns empty, documented in provider adapter	Open
R-005	Mistral SDK 2.2.0 stability — user-maintained SDK, recently updated	Low	User maintains it, can fix bugs directly. Integration tests catch regressions.	Accepted
R-006	Bubble Tea v2 maturity — v2 is relatively new	Low	Pin version, fallback to v1 if blockers. TUI is last milestone item.	Open
R-007	Multi-provider routing complexity — coordinating elfs on different providers with different capabilities	High	Design routing interface early (M4), start simple (manual provider assignment), add rules incrementally	Open
R-008	Context compaction coherence — summarization may lose critical details	Medium	Truncation as safe default, summarization opt-in, compact boundaries for recovery	Open
R-009	Permission prompt UX in pipe mode — no TUI for interactive prompts	Low	Default to `allow` or `deny` in pipe mode, require explicit flag	Open
R-010	Router complexity — bandit tuning, cold start problem	High	Ship default.state with embedded priors, heuristic fallback for <5 observations	Open
R-011	Security false positives — blocking legitimate content	Medium	Warn-first mode, user override per-pattern, configurable sensitivity	Open
R-012	Feedback attribution — delayed/noisy signals for orchestration tasks	Medium	Neutral default for missing signals, ensemble contribution rank as strong signal	Open
R-013	Task learning privacy — pattern data persistence	Low	Patterns stored locally only, cleared in incognito mode	Open
R-014	Ensemble synthesis quality — depends heavily on synthesis prompt	Medium	Invest in prompt engineering, A/B test with polisher arm	Open
R-015	Shell parser dependency — `mvdan.cc/sh` for compound command decomposition	Low	Well-maintained Go package, fallback to regex-based decomposition if needed	Open

Open Questions

How should routing rules be expressed in config? Per-task rules, model capability tags, cost-based? — needs research before M5
Which local tokenizer library to use? (tiktoken port, sentencepiece, or provider-specific)
Serve mode protocol — choose what fits best when implementing M10
What automated quality evaluation to use for router feedback? (compile check, linter, self-consistency, small local judge model)
~~Should gnoma embed a tokenizer?~~ → Yes, include local tokenizer (M6)
~~Session persistence format?~~ → SQLite (M10)
~~Mistral SDK as long-term reference?~~ → Yes for now, revisit after M2

Changelog

2026-04-02: Initial version
2026-04-03: Added R-010 through R-015 for router, security, feedback, task learning, shell parser

3.5 KiB Raw Permalink Blame History

Risk / Unknowns

Open Questions

Changelog

3.5 KiB

Raw Permalink Blame History