Files
gnoma/docs/essentials/milestones.md

289 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
essential: milestones
status: complete
last_updated: 2026-04-06
project: gnoma
depends_on: [vision]
---
# Milestones
## Overview
| # | Name | Core Deliverable | Deps |
|---|------|-----------------|------|
| M1 | Core Engine | Pipe mode, Mistral, tools, agentic loop | — |
| M2 | Multi-Provider | All providers, config, dynamic switching | M1 |
| M3 | Security Firewall | Request/response scanning, redaction, incognito | M2 |
| M4 | Router Foundation | Arm registry, pools, task classifier, heuristic selection | M2 |
| M5 | TUI | Bubble Tea, 6 permission modes, config screen | M3, M4 |
| M6 | Context Intelligence | Local tokenizer, fixed context prefix, full compaction | M5 |
| M7 | Elfs | Router-integrated sub-agents, parallel work | M4, M6 |
| M8 | Extensibility | Hooks, skills, MCP client, MCP tool replaceability, plugins | M7 |
| M9 | Router Advanced | Bandit core, feedback, ensemble strategies, state persistence | M7 |
| M10 | Persistence & Serve | SQLite sessions, serve mode, coordinator | M7 |
| M11 | Task Learning | Pattern recognition, task suggestions, persistent tasks | M9 |
| M12 | Thinking, Multimodality & Structured Output | Thinking, multimodal I/O, schema validation | M2 |
| M13 | Auth | OAuth PKCE, keyring, multi-account | M5 |
| M14 | Observability | Feature flags, telemetry, cost dashboards | M10 |
| M15 | Web UI | `gnoma web` CLI flag, browser UI via serve mode | M10 |
---
## M1: Core Engine (MVP)
**Scope:** First working assistant. CLI pipe mode. Mistral as reference provider. Bash + file tools (with 7 critical security checks). No TUI, no permissions, no config file.
**Deliverables:**
- [x] Architecture docs in `docs/essentials/`
- [ ] Foundation types (`internal/message/`)
- [ ] Streaming abstraction (`internal/stream/`)
- [ ] Provider interface + Mistral adapter
- [ ] Tool system: bash (with security checks), fs.read, fs.write, fs.edit, fs.glob, fs.grep
- [ ] Engine agentic loop (stream → tool → re-query → done)
- [ ] CLI pipe mode (`echo "list files" | gnoma`)
- [ ] System package inventory: detect installed tools/packages at startup, include in system prompt so the LLM knows what's available
**Exit criteria:** Pipe a coding question in, get a response that uses tools, answer on stdout.
## M2: Multi-Provider
**Scope:** All remaining providers. TOML config with layered loading. Dynamic provider switching.
**Deliverables:**
- [ ] TOML config system (defaults → user → project → env → flags)
- [ ] API key resolution from env vars and config
- [ ] Anthropic provider (streaming + tool use + thinking blocks)
- [ ] OpenAI provider (streaming + tool use)
- [ ] Google provider (streaming + function calling, goroutine bridge)
- [ ] OpenAI-compat for Ollama and llama.cpp
- [ ] `--provider` / `--model` flag switching
**Exit criteria:** `echo "hello" | gnoma --provider openai` works. All 5+ providers functional.
## M3: Security Firewall
**Scope:** Core security layer built into gnoma. Scans outgoing LLM requests and incoming tool results for sensitive data. Redacts or blocks. Incognito mode.
**Deliverables:**
- [ ] Secret scanner (gitleaks-derived, 40+ regex patterns, Shannon entropy detection)
- [ ] Unicode sanitization (NFKC + Cf/Co/Cn stripping, recursive on nested structs)
- [ ] Redactor (replace matched groups with `[REDACTED]`, preserve context)
- [ ] Configurable rules (regex patterns, action: redact/block/warn)
- [ ] Remaining bash security checks (checks 8-23 from CC bashSecurity.ts)
- [ ] Incognito mode: no persistence, no learning, no logging, optional local-only routing
- [ ] `--incognito` CLI flag
**Exit criteria:** Provider requests with embedded API keys get redacted. Incognito suppresses all persistence. Unicode attack vectors sanitized.
## M4: Router Foundation
**Scope:** Arm registry, limit pools, task classification, heuristic selection. Engine switches from direct provider calls to `router.Select()`.
**Deliverables:**
- [ ] Arm type (provider+model pair) with capability introspection
- [ ] Limit pools (RPM, RPD, tokens/day, cost caps, custom units)
- [ ] Pool tracker with optimistic reservation and scarcity multipliers
- [ ] Task classifier (10 types: Boilerplate, Generation, Refactor, Review, UnitTest, Planning, Orchestration, SecurityReview, Debug, Explain)
- [ ] Complexity scoring and value scoring
- [ ] Heuristic arm selection (score = quality × value / effective_cost)
- [ ] Background provider discovery (poll ollama, llama.cpp, API providers)
- [ ] Engine integration: `router.Select()` replaces direct provider calls
**Exit criteria:** Engine routes tasks through router. Limit pools track consumption. Task classification works for 10 types.
## M5: TUI
**Scope:** Interactive terminal UI. Full 6-mode permission system. Session management. In-app config. Incognito toggle.
**Deliverables:**
- [ ] Permission system with all 6 modes:
- `default` — prompt for each tool invocation
- `acceptEdits` — auto-allow file ops, prompt for bash/destructive
- `bypass` — allow everything
- `deny` — deny all unless explicit allow rule
- `plan` — read-only tools only
- `auto` — router task classification + tool risk scoring
- [ ] Permission rules with compound bash command decomposition (via `mvdan.cc/sh` AST)
- [ ] 7-step permission decision flow (deny gates → tool check → safety → mode → allow → passthrough → hooks)
- [ ] Bubble Tea TUI: chat panel, input, streaming output
- [ ] Status bar (provider, model, tokens, incognito indicator)
- [ ] Permission prompt overlay
- [ ] Model picker overlay
- [ ] In-app config editor (`/config` command)
- [ ] Incognito toggle (`/incognito` command)
- [ ] Interactive shell pane: `/shell` command or keybinding opens PTY-connected shell
- For commands needing user input (sudo, ssh, git push with auth, passwd prompts)
- Bash tool detects potentially interactive commands and suggests take-over
- PTY-based execution for flagged commands
- [ ] Session management (channel-based)
**Exit criteria:** Launch TUI, chat interactively, 6 permission modes work, config editable in-app, incognito toggleable, `/shell` opens interactive terminal for password prompts.
## M6: Context Intelligence
**Scope:** Long sessions. Local tokenizer. Full compaction with both truncation and LLM summarization.
**Deliverables:**
- [ ] Local tokenizer for accurate token counting
- [ ] Token tracker with warning states (OK / Warning / Critical)
- [ ] Fixed context prefix: system prompt + loaded md files (CLAUDE.md, project docs) pinned as immutable prefix. Only conversation history after the prefix gets compacted.
- [ ] TruncateStrategy: drop oldest, preserve system + fixed prefix + recent
- [ ] SummarizeStrategy: spawn compaction elf, LLM-powered summary, image stripping, boundary messages
- [ ] Auto-compaction triggers (threshold-based, reactive on 413, circuit breaker after 3 failures)
- [ ] Pre/post compact hooks
- [ ] Tool result persistence (>50KB → disk, 2KB preview + filepath)
- [ ] Deferred tool loading (`ShouldDefer()`, full schema on demand)
- [ ] Post-compact restoration budget (50K total, 5K/file, 25K/skill)
**Exit criteria:** 100+ turn conversation stays coherent. Summarization produces useful summaries. Token counting within 5% of provider.
## M7: Elfs (Router-Integrated)
**Scope:** Sub-agents using router for provider selection. Parallel work. Feedback to router.
**Deliverables:**
- [x] Elf interface + BackgroundElf implementation
- [x] ElfManager: spawn, monitor, cancel, collect results
- [x] Router-integrated spawning (`router.Select()` picks arm per elf)
- [x] Parent ↔ elf communication via typed channels (elf.Progress)
- [x] Concurrent tool execution (read-only parallel via WaitGroup, writes serial)
- [x] `agent` tool: single elf spawn with tree progress view
- [x] `spawn_elfs` tool: batch N elfs in one call, all run in parallel
- [x] CC-style tree view: ├─/└─ branches, tool uses, tokens, activity, Done(duration)
- [x] Elf output truncated to 2000 chars for parent context protection
- [x] Elf results feed back to router as quality signals
- [x] Coordinator mode: orchestrator dispatches to worker elfs
**Exit criteria:** Parent spawns 3 elfs via `spawn_elfs`, all run in parallel (chosen by router), tree shows live progress, results synthesized.
## M8: Extensibility
**Scope:** Hooks, skills, MCP client with tool replaceability, plugin system.
**Deliverables:**
- [x] Hook system: PreToolUse, PostToolUse, SessionStart/End, PreCompact, Stop
- [x] Hook protocol: stdin JSON, stdout JSON, exit codes (0=allow, 2=deny)
- [x] Hook command types: command (shell), prompt (LLM), agent (spawn elf)
- [x] Skill loading from .gnoma/skills/, ~/.config/gnoma/skills/, bundled, plugins
- [x] Skill frontmatter: YAML (name, description, whenToUse, allowedTools, paths)
- [ ] MCP client: JSON-RPC over stdio, tool discovery
- [ ] MCP tool naming: `mcp__{server}__{tool}`
- [ ] MCP tool replaceability: `replace_default` config swaps built-in tools
- [ ] Plugin system: plugin.json manifest, install/enable/disable lifecycle
- [x] `/batch` skill: decompose work into N units, spawn all via `spawn_elfs`, track progress (CC-inspired)
- [x] Coordinator mode prompt: fan-out guidance for parallel elf dispatch, concurrency rules (read vs write)
**Exit criteria:** MCP tools appear in gnoma. `replace_default` swaps built-ins. Skills invocable. Hooks fire on tool use. `/batch` decomposes and parallelizes work.
## M9: Router Advanced
**Scope:** Full bandit learning. Feedback collection. Ensemble execution strategies. State persistence.
**Deliverables:**
- [ ] Discounted Thompson Sampling (per-arm, per-task-type Beta distributions)
- [ ] Feedback collection: implicit (acceptance, edit distance, escalation) + explicit
- [ ] Delayed attribution for orchestration/planning tasks
- [ ] Execution strategies: SingleArm, CascadeWithReview, ParallelEnsemble, MultiRoundSynthesis
- [ ] Strategy selection as learned routing decision
- [ ] Background arm benchmarking (TTFT, tok/s)
- [ ] State persistence (gob, versioned schema, atomic writes, CRC32)
- [ ] Cold start: shipped default.state with embedded priors
- [ ] Heuristic fallback for <5 observations per arm-task pair
**Exit criteria:** Bandit converges after ~50 observations. Ensemble outperforms single-arm on complex tasks. State persists across restarts.
## M10: Persistence & Serve
**Scope:** SQLite session persistence. Serve mode. Coordinator mode.
**Deliverables:**
- [ ] SQLite session storage (messages, parentUuid chain, tombstones)
- [ ] Session memory: background elf extracts notes from conversation
- [ ] Incognito enforcement: sessions NOT persisted
- [ ] Serve mode: Unix socket listener, spawn session goroutine per client
- [ ] Coordinator mode: orchestrator dispatches to restricted worker elfs
- [ ] Task framework: registered tasks with lifecycle (pending/running/completed/failed), abort controllers (CC-inspired AppState.tasks)
- [ ] Task notification system: completed background elfs inject `<task-notification>` messages into parent conversation (CC-inspired)
- [ ] StreamingToolExecutor: concurrent-safe tool classification, sibling abort on failure (CC-inspired)
- [ ] Git worktree isolation: `isolation: "worktree"` gives each elf a separate working copy (CC-inspired)
**Exit criteria:** Resume yesterday's conversation. External client connects via serve mode. Task notifications flow from background elfs to parent.
## M11: Task Learning
**Scope:** Detect recurring task patterns. Suggest persistent tasks. Refinement loop.
**Deliverables:**
- [ ] Pattern detector: observe turn sequences, identify repeats (≥3 times)
- [ ] Task suggestion UX: prompt user to save as persistent task
- [ ] Persistent task definitions: parameterized sequences, stored in .gnoma/tasks/ or ~/.config/gnoma/tasks/
- [ ] `/task <name> [args]` execution command
- [ ] Router feedback integration: learn which arm works best per task step
- [ ] Task refinement: re-split tasks, measure improvement
**Exit criteria:** gnoma suggests a persistent task after 3+ repetitions. `/task release v1.2.0` executes a saved workflow.
## M12: Thinking, Structured Output, Notebook & Multimodality
**Deliverables:**
- [ ] Thinking mode (disabled / enabled with budget / adaptive)
- [ ] Thinking block streaming and TUI display
- [ ] Structured output with JSON schema validation
- [ ] Retry logic for schema validation failures
- [ ] NotebookEdit tool: read/write/edit Jupyter notebook cells (.ipynb)
- [ ] Multimodal input: image support (Anthropic image blocks, OpenAI content parts, Google inline data)
- [ ] Multimodal input: audio support (where provider supports it)
- [ ] Multimodal output: image rendering in TUI (sixel/kitty protocol)
## M13: Auth
**Deliverables:**
- [ ] OAuth 2.0 + PKCE flow (browser → callback → token exchange)
- [ ] Proactive token refresh (before expiry)
- [ ] OS keyring integration for credential storage
- [ ] Multi-account support per provider
## M14: Observability
**Deliverables:**
- [ ] Feature flag system (local config + optional remote)
- [ ] Opt-in analytics (event queue, local-only by default)
- [ ] Usage dashboards (token spend, provider usage, tool frequency)
- [ ] Cost tracking per provider/model
## M15: Web UI
**Deliverables:**
- [ ] `gnoma web` CLI subcommand starts local web server
- [ ] Connects to serve mode backend (M10 prerequisite)
- [ ] Chat interface with streaming, tool output, permission prompts
## Future
- Voice input/output via provider audio APIs
- Collaborative sessions (multiple humans + elfs)
- Plugin marketplace
- Remote agent execution
- Federated learning for router priors (opt-in, anonymized)
## Changelog
- 2026-04-02: Initial version (M1-M11)
- 2026-04-03: Restructured to M1-M15. Split providers/TUI. Added Security (M3), Router Foundation (M4), Router Advanced (M9), Task Learning (M11). Full 6 permission modes. Full compaction. CC pattern integration.