docs: refresh README/CONTRIBUTING/AGENTS/TODO, add LICENSE, drop obsolete files

Top-level docs were stale and the .gitea/ issue templates referenced a workflow that is no longer in use. - README: rewrite around the current feature set (SLM routing, profiles, plugin TOFU, SafeProvider boundary, current model defaults). Add a pre-built-binary install section plus Docker (ghcr.io) install path for users without a Go toolchain. Document the GitHub mirror. - CONTRIBUTING: drop the dead issue-template reference, note Gitea upstream + GitHub mirror split, expand the package map and test-target table. - AGENTS: rebuild as a domain glossary (Elf / Arm / Turn / SafeProvider / Incognito / Profile) plus non-obvious conventions an outside agent needs and would not infer from the code. - TODO: trim completed waves into a History section, fix a broken link to the never-written Wave 3 plan file, surface active backlog. - docs/essentials/INDEX: add ADR-004 (PostToolUse hook ordering) to the ADR list. - LICENSE + NOTICE: adopt Apache License 2.0. Patent grant matters because gnoma bundles SDKs from Anthropic / OpenAI / Google / Mistral and ships derivative tooling that runs untrusted MCP servers. - Delete .gitea/issue_template/ and gemma-integration-analysis.md (latter is obsolete per its own preamble — Node.js-specific notes that don't apply to the Go implementation).
2026-05-20 03:13:40 +02:00
parent 99fa0ff08e
commit 5170c73dac
10 changed files with 640 additions and 548 deletions
@@ -1,58 +0,0 @@
-name: Bug Report
-about: Report something that isn't working correctly
-labels:
-  - bug
-body:
-  - type: textarea
-    id: description
-    attributes:
-      label: Description
-      description: What happened? What did you expect?
-    validations:
-      required: true
-  - type: textarea
-    id: reproduction
-    attributes:
-      label: Steps to reproduce
-      description: Minimal steps to trigger the issue
-      placeholder: |
-        1. Run `gnoma --provider anthropic`
-        2. Type "..."
-        3. See error
-    validations:
-      required: true
-  - type: input
-    id: version
-    attributes:
-      label: gnoma version
-      description: Output of `gnoma --version`
-      placeholder: "gnoma 0.1.0 (abc1234, 2026-04-12)"
-    validations:
-      required: true
-  - type: input
-    id: os
-    attributes:
-      label: OS / Architecture
-      placeholder: "Linux x86_64 / macOS arm64 / Windows amd64"
-    validations:
-      required: true
-  - type: dropdown
-    id: provider
-    attributes:
-      label: Provider
-      options:
-        - mistral
-        - anthropic
-        - openai
-        - google
-        - ollama
-        - llamacpp
-        - N/A
-    validations:
-      required: false
-  - type: textarea
-    id: logs
-    attributes:
-      label: Relevant logs
-      description: Run with `--verbose` for debug output
-      render: shell
@@ -1,42 +0,0 @@
-name: Feature Request
-about: Suggest an improvement or new capability
-labels:
-  - enhancement
-body:
-  - type: textarea
-    id: problem
-    attributes:
-      label: Problem
-      description: What are you trying to do that gnoma doesn't support well?
-    validations:
-      required: true
-  - type: textarea
-    id: solution
-    attributes:
-      label: Proposed solution
-      description: How would you like this to work?
-    validations:
-      required: true
-  - type: textarea
-    id: alternatives
-    attributes:
-      label: Alternatives considered
-      description: Other approaches you've thought about
-    validations:
-      required: false
-  - type: dropdown
-    id: area
-    attributes:
-      label: Area
-      options:
-        - providers
-        - tools
-        - router
-        - TUI
-        - MCP / plugins
-        - elfs (sub-agents)
-        - security
-        - config
-        - other
-    validations:
-      required: false
@@ -1,26 +1,75 @@
 # AGENTS.md

-## Domain Terminology
- **Elf**: An agent instance.
- **Turn**: A complete sequence of agentic reasoning and tool execution.
- **Routing Arm**: A specific model/provider selected by the `Router` for a task.
- **Stream Event**: Discrete updates during LLM generation (e.g., `EventTextDelta`, `EventToolCallStart`, `EventToolResult`).
+Conventions for AI assistants working in this repository. CLAUDE.md
+covers Go style, commits, and TDD policy; this file adds gnoma-specific
+domain knowledge those rules do not capture.

-## Build & Test Targets
- **Run**: `make run`
- **Test (Verbose)**: `make test-v`
- **Integration Tests**: `make test-integration` (requires `//go:build integration`)
+## Domain glossary

-## Key Dependencies
- **Mistral**: `github.com/VikingOwl91/mistral-go-sdk`
- **Anthropic**: `github.com/anthropics/anthropic-sdk-go`
- **OpenAI**: `github.com/openai/openai-go`
- **Google GenAI**: `google.golang.org/genai`
- **TUI**: `charm.land/bubbletea/v2`, `charm.land/lipgloss/v2`
- **Other**: `charm.land/bubbles/v2`, `charm.land/glamour/v2`, `github.com/pkoukk/tiktoken-go`
+| Term | Meaning |
+|---|---|
+| **Elf** | A sub-agent instance, spawned via `spawn_elfs`. |
+| **Turn** | One complete `stream → tool → re-query` cycle in the engine. |
+| **Arm** | A `(provider, model)` pair the router can select. Registered with cost and capability metadata. |
+| **Router** | Multi-armed-bandit selector that picks an Arm per Turn from the registered set. |
+| **SLM** | Small language model running locally for prompt classification and trivial-task execution. |
+| **Stream Event** | Discriminated-union update emitted while a provider streams: `EventTextDelta`, `EventToolCallStart`, `EventToolResult`, etc. See `internal/stream/event.go`. |
+| **SafeProvider** | The sealed boundary that gates outbound provider calls — every Provider implementation embeds the unexported marker. See `internal/security`. |
+| **Incognito** | Per-turn mode that disables session persistence and router learning. |
+| **Profile** | A named config overlay under `~/.config/gnoma/profiles/`. Switches keys, models, and per-profile router quality data. |

-## Environment Variables
- `MISTRAL_API_KEY`: Required for Mistral provider.
- `ANTHROPIC_API_KEY`: Required for Anthropic provider.
- `OPENAI_API_KEY`: Required for OpenAI provider.
- `GOOGLE_API_KEY`: Required for Google provider.
+## Build & test targets (beyond standard)
+
+| Target | Purpose |
+|---|---|
+| `make test-v` | Verbose unit tests |
+| `make test-integration` | Runs `//go:build integration` tests (real API calls) |
+| `make check` | fmt + vet + lint + test (use before committing) |
+| `go test -bench=. ./internal/router/` | Router benchmarks |
+
+## Provider env vars
+
+| Provider | Primary | Alternative |
+|---|---|---|
+| Anthropic | `ANTHROPIC_API_KEY` | `ANTHROPICS_API_KEY` |
+| OpenAI | `OPENAI_API_KEY` | — |
+| Google | `GEMINI_API_KEY` | `GOOGLE_API_KEY` |
+| Mistral | `MISTRAL_API_KEY` | — |
+
+`GNOMA_PROVIDER` and `GNOMA_MODEL` override the resolved config.
+
+## Non-obvious conventions
+
+- **Discriminated unions** are structs with a `Type` field and pointer
+  payloads — not Go interfaces. See `internal/stream/event.go` and
+  `internal/message`.
+- **Pull-based iterators** follow the `Next() / Current() / Err() / Close()`
+  shape. Streams in `internal/provider/*/stream.go` are the canonical examples.
+- **`json.RawMessage`** flows through `tool.Definition.Parameters` and tool
+  arguments untouched — never marshal/unmarshal in the middle.
+- **Capabilities and ContextWindow** come from `internal/provider`
+  `inferXxxModelCapabilities` per provider; updating model lists also updates
+  these tables and the `ratelimits.go` map.
+- **Hook ordering** matters for `PostToolUse`. See ADR-004.
+- **Plugin trust** is TOFU pinning — see `internal/plugin/pinstore.go` and
+  ADR-003.
+
+## Sub-agent (elf) etiquette
+
+When spawning elfs:
+
+- One `spawn_elfs` call for all parallel work; never spawn one at a time.
+- Read-only tasks on disjoint files parallelize cleanly.
+- Writes to the same file must be sequenced into one elf.
+- Cap each batch at 5–7 elfs.
+
+See `internal/skill/skills/batch.md` for the canonical batching template.
+
+## Reference docs
+
+- Architecture map: `docs/essentials/INDEX.md`
+- ADRs: `docs/essentials/decisions/`
+- Profiles: `docs/profiles.md`
+- SLM backends: `docs/slm-backends.md`
+- Plugin trust: `docs/plugins-trust.md`
+- Router benchmarks: `docs/benchmarks/README.md`
@@ -1,5 +1,10 @@
 # Contributing to gnoma

+The upstream repository lives at
+<https://somegit.dev/Owlibou/gnoma> and is mirrored to
+<https://github.com/VikingOwl91/gnoma>. PRs are accepted on the upstream
+(Gitea) instance; the GitHub mirror is read-only.
+
 ## Setup

 ```sh
@@ -11,34 +16,43 @@ make lint    # requires golangci-lint

 ## Development workflow

-1. Create a branch from `main`
-2. Write tests first (TDD) — table-driven, `t.TempDir()` for filesystem tests
-3. `make check` (fmt + vet + lint + test) must pass
-4. Commit with conventional messages: `feat:`, `fix:`, `refactor:`, `test:`, `docs:`
+1. Branch from `main`.
+2. Write tests first (TDD). Table-driven where possible, `t.TempDir()` for
+   filesystem tests, `testing/synctest` for concurrent ones.
+3. `make check` (fmt + vet + lint + test) must pass.
+4. Conventional commits: `feat:`, `fix:`, `refactor:`, `test:`, `docs:`,
+   `chore:`. **No co-signing or "Generated-by" trailers.**

 ## Code style

- Go 1.26 idioms (`new(expr)`, `errors.AsType[E]`)
- Structured logging with `log/slog`
- `json.RawMessage` for tool schemas (zero-cost passthrough)
- Functional options for complex configuration
- Short, lowercase package names — no underscores
+- Go 1.26 idioms (`new(expr)`, `errors.AsType[E]`, `sync.WaitGroup.Go`).
+- Structured logging with `log/slog`.
+- `json.RawMessage` for tool schemas (zero-cost passthrough).
+- Functional options for complex configuration.
+- Short, lowercase package names — no underscores.
+- Discriminated unions via struct + type discriminant, not interfaces.
+- Pull-based stream iterators: `Next() / Current() / Err() / Close()`.

 ## Testing

- Unit tests: `make test`
- Integration tests (require API keys): `make test-integration`
- Coverage: `make cover`
- Benchmarks: `go test -bench=. ./internal/router/`
+| Command | What it runs |
+|---|---|
+| `make test` | unit tests |
+| `make test-integration` | tests behind `//go:build integration` — requires real API keys |
+| `make cover` | coverage → `coverage.html` |
+| `make lint` | `golangci-lint run ./...` |
+| `make check` | fmt + vet + lint + test |
+| `go test -bench=. ./internal/router/` | router benchmarks |

-Integration tests use `//go:build integration` and are skipped by default.
+Integration tests are skipped by default.

 ## Architecture

-Read `docs/essentials/INDEX.md` before making architectural changes. Key packages:
+Read [`docs/essentials/INDEX.md`](docs/essentials/INDEX.md) before changing
+architectural boundaries. Key packages:

 | Package | Purpose |
-|---------|---------|
+|---|---|
 | `internal/engine` | Agentic loop (stream → tool → re-query) |
 | `internal/router` | Multi-armed bandit arm selection |
 | `internal/provider` | LLM provider adapters |
@@ -46,8 +60,24 @@ Read `docs/essentials/INDEX.md` before making architectural changes. Key package
 | `internal/mcp` | MCP client (JSON-RPC over stdio) |
 | `internal/plugin` | Plugin manifest, loader, manager |
 | `internal/elf` | Sub-agent (elf) system |
-| `internal/tui` | Bubble Tea terminal UI |
+| `internal/security` | SafeProvider boundary, firewall, output scanner |
+| `internal/skill` | Skill registry and templating |
+| `internal/slm` | Small-language-model classifier + arm |
+| `internal/tui` | Bubble Tea v2 terminal UI |

-## Issues
+ADRs live in [`docs/essentials/decisions/`](docs/essentials/decisions/).

-Use the issue templates when filing bugs or requesting features. Include reproduction steps, expected behavior, and gnoma version (`gnoma --version`).
+## Reporting issues
+
+File issues on the upstream Gitea instance with:
+
+- A short reproduction (commands, prompts, configs that triggered the bug).
+- Expected vs. actual behavior.
+- `gnoma --version` output and OS / architecture.
+- Provider and model in use, if relevant.
+- `--verbose` log output if it sheds light.
+
+## License
+
+By contributing you agree your work is licensed under the
+[Apache License 2.0](LICENSE).
@@ -0,0 +1,202 @@
+
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [yyyy] [name of copyright owner]
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
@@ -0,0 +1,5 @@
+gnoma
+Copyright 2026 vikingowl
+
+This product includes software developed at the gnoma project
+(https://somegit.dev/Owlibou/gnoma).
@@ -1,234 +1,153 @@
 # gnoma

-**A provider-agnostic agentic coding assistant built in Go.** gnoma routes tasks to the best available LLM — cloud or local — through a multi-armed bandit router, while tools, hooks, skills, MCP servers, and plugins keep it extensible. Named after the northern pygmy-owl (*Glaucidium gnoma*); agents are called **elfs** (elf owl).
+**A provider-agnostic agentic coding assistant in Go.** gnoma routes each prompt
+to the best available model — cloud or local — through a multi-armed bandit
+router, executes tools on your behalf, and stays extensible through hooks,
+skills, MCP servers, and plugins.
+
+Named after the northern pygmy-owl (*Glaucidium gnoma*); agents are called
+**elfs** (elf owl).
+
+- **Upstream:** <https://somegit.dev/Owlibou/gnoma>
+- **GitHub mirror:** <https://github.com/VikingOwl91/gnoma>
+
+---
+
+## Install
+
+### Pre-built binary (no Go toolchain required)
+
+Releases are built by [GoReleaser](.goreleaser.yml) for
+`linux`, `darwin`, and `windows` × `amd64`/`arm64` as static (`CGO_ENABLED=0`)
+archives. Until the first tag is cut, see "Build from source" below.
+
+Once releases are published:
+
+```sh
+# Pick the archive matching your OS/arch from the releases page:
+#   https://somegit.dev/Owlibou/gnoma/releases   (upstream)
+#   https://github.com/VikingOwl91/gnoma/releases (mirror)
+
+# Linux/macOS one-liner (substitute the asset URL):
+curl -fsSL <ARCHIVE_URL> | tar -xz -C /tmp
+sudo mv /tmp/gnoma /usr/local/bin/
+gnoma --version
+```
+
+Windows: download the `_windows_*.zip`, extract `gnoma.exe`, and put it on
+`%PATH%`.
+
+### Docker
+
+Multi-arch images (`linux/amd64`, `linux/arm64`) are published to GitHub
+Container Registry on each tagged release:
+
+```sh
+docker pull ghcr.io/vikingowl91/gnoma:latest
+docker run --rm -it \
+  -v "$PWD:/workspace" \
+  -e ANTHROPIC_API_KEY \
+  ghcr.io/vikingowl91/gnoma:latest --version
+```
+
+Mount your project as `/workspace` (the image's working directory) and pass
+provider keys via `-e`.
+
+### Go users
+
+```sh
+go install somegit.dev/Owlibou/gnoma/cmd/gnoma@latest   # latest tagged
+go install somegit.dev/Owlibou/gnoma/cmd/gnoma@main     # bleeding edge
+```
+
+### Build from source
+
+```sh
+git clone https://somegit.dev/Owlibou/gnoma && cd gnoma
+make build       # → ./bin/gnoma
+make install     # → $GOPATH/bin/gnoma
+```
+
+Requires Go 1.26+.
+
+---

 ## Quickstart

 ```sh
-# Install
-go install somegit.dev/Owlibou/gnoma/cmd/gnoma@latest
+# Set at least one provider key (or run a local model — see Providers below).
+export ANTHROPIC_API_KEY=sk-ant-...

-# Or build from source
-git clone https://somegit.dev/Owlibou/gnoma && cd gnoma
-make build    # binary at ./bin/gnoma
-
-# Set at least one provider key
-export ANTHROPIC_API_KEY=sk-ant-...   # or OPENAI_API_KEY, MISTRAL_API_KEY, GEMINI_API_KEY
-
-# Run
-gnoma                                 # interactive TUI
-echo "list files" | gnoma             # pipe mode
-gnoma --provider ollama               # use a local model
+gnoma                              # interactive TUI
+echo "list files" | gnoma          # pipe / one-shot mode
+gnoma --provider ollama            # use a local model
+gnoma --version
 ```

-## Build
+Inside the TUI, `Ctrl+X` toggles **incognito** (no session saved, no router
+learning); `/help` lists slash commands; `Esc` cancels an in-flight turn.

-```sh
-make build          # ./bin/gnoma
-make install        # $GOPATH/bin/gnoma
-```
+---

 ## Providers

-### Anthropic
+| Provider | Env var | Default model | Also available |
+|---|---|---|---|
+| Anthropic | `ANTHROPIC_API_KEY` | `claude-sonnet-4-6` | `claude-opus-4-7`, `claude-haiku-4-5-20251001` |
+| OpenAI | `OPENAI_API_KEY` | `gpt-5.5` | `gpt-5.5-pro`, `gpt-5.2`, `gpt-5.2-chat-latest` |
+| Google (Gemini) | `GEMINI_API_KEY` (alt: `GOOGLE_API_KEY`) | `gemini-3.5-flash` | `gemini-3.1-pro-preview`, `gemini-3.1-flash-lite` |
+| Mistral | `MISTRAL_API_KEY` | `mistral-large-latest` (Mistral Large 3) | `mistral-medium-3.5`, `magistral-medium-2509` |
+| Ollama (local) | — | `qwen3:8b` (override with `--model`) | any model on your Ollama instance |
+| llama.cpp (local) | — | reported by `/v1/models` | n/a |
+| Subprocess (`claude`, `gemini`, `agy` CLIs) | provider-specific | binary name | configurable via `[cli_agents]` |
+
+Override per-invocation:

 ```sh
-export ANTHROPIC_API_KEY=sk-ant-...
-./bin/gnoma --provider anthropic
-./bin/gnoma --provider anthropic --model claude-opus-4-5-20251001
+gnoma --provider anthropic --model claude-opus-4-7
+gnoma --provider openai    --model gpt-5.5-pro     # GPT-5.5 is the default; pro is the higher-accuracy tier
+gnoma --provider google    --model gemini-3.1-pro-preview
+gnoma --provider ollama    --model qwen2.5-coder:3b
+gnoma --provider llamacpp                          # model picked from server
 ```

-Integration tests hit the real API — keep a key in env:
+`gnoma providers` prints every discovered provider, model, and CLI agent.
+
+### Local models
+
+Start your local server, then point gnoma at it:

 ```sh
-go test -tags integration ./internal/provider/...
-```
+# Ollama (default http://localhost:11434/v1)
+ollama pull qwen2.5-coder:3b
+gnoma --provider ollama --model qwen2.5-coder:3b

---
-
-### OpenAI
-
-```sh
-export OPENAI_API_KEY=sk-proj-...
-./bin/gnoma --provider openai
-./bin/gnoma --provider openai --model gpt-4o
-```
-
---
-
-### Mistral
-
-```sh
-export MISTRAL_API_KEY=...
-./bin/gnoma --provider mistral
-```
-
---
-
-### Google (Gemini)
-
-```sh
-export GEMINI_API_KEY=AIza...
-./bin/gnoma --provider google
-./bin/gnoma --provider google --model gemini-2.0-flash
-```
-
---
-
-### Ollama (local)
-
-Start Ollama and pull a model, then:
-
-```sh
-./bin/gnoma --provider ollama --model gemma4:latest
-./bin/gnoma --provider ollama --model qwen3:8b     # default if --model omitted
-```
-
-Default endpoint: `http://localhost:11434/v1`. Override via config or env:
-
-```sh
-# .gnoma/config.toml
-[provider]
-default = "ollama"
-model   = "gemma4:latest"
-
-[provider.endpoints]
-ollama = "http://myhost:11434/v1"
-```
-
---
-
-### llama.cpp (local)
-
-Start the llama.cpp server:
-
-```sh
+# llama.cpp (default http://localhost:8080/v1)
 llama-server --model /path/to/model.gguf --port 8080 --ctx-size 8192
+gnoma --provider llamacpp
 ```

-Then:
+Override the endpoint in `.gnoma/config.toml`:

-```sh
-./bin/gnoma --provider llamacpp
-# model name is taken from the server's /v1/models response
-```
-
-Default endpoint: `http://localhost:8080/v1`. Override:
-
-```sh
+```toml
 [provider.endpoints]
+ollama   = "http://myhost:11434/v1"
 llamacpp = "http://localhost:9090/v1"
 ```

 ---

-## Extensibility (M8)
-
-gnoma supports hooks, skills, MCP servers, and plugins.
-
-### MCP Servers
-
-Connect any [MCP](https://modelcontextprotocol.io)-compatible tool server:
-
-```toml
-[[mcp_servers]]
-name    = "git"
-command = "mcp-server-git"
-args    = ["--repo", "."]
-timeout = "30s"
-
-# Replace a built-in tool with an MCP tool
-[mcp_servers.replace_default]
-exec = "bash"   # MCP tool "exec" replaces gnoma's built-in "bash"
-```
-
-MCP tools appear as `mcp__{server}__{tool}` (e.g., `mcp__git__status`), or under the built-in name when using `replace_default`.
-
-### Skills
-
-Drop markdown files into `.gnoma/skills/` or `~/.config/gnoma/skills/`:
-
-```
-/skillname          # invoke a skill
-/skills             # list available skills
-```
-
-### Hooks
-
-Run shell commands on tool events:
-
-```toml
-[[hooks]]
-name         = "block-rm-rf"
-event        = "pre_tool_use"
-type         = "command"
-exec         = "bash-safety-check.sh"
-tool_pattern = "bash*"
-```
-
-### Plugins
-
-Bundle skills, hooks, and MCP configs into installable plugins:
-
-```sh
-gnoma plugin install ./my-plugin    # install from directory
-gnoma plugin list                   # list installed plugins
-```
-
-Plugins are pinned by SHA-256 of their `plugin.json` on first load
-(Trust-On-First-Use). A manifest that changes between runs is refused with a
-clear error and a re-enrollment hint. See [docs/plugins-trust.md](docs/plugins-trust.md)
-and [ADR-003](docs/essentials/decisions/003-plugin-trust.md).
-
---
-
-## Session Persistence
-
-Conversations are auto-saved to `.gnoma/sessions/` after each completed turn. On a crash you lose at most the current in-flight turn; all previously completed turns are safe.
-
-### Resume a session
-
-```sh
-gnoma --resume              # interactive session picker (↑↓ navigate, Enter load, Esc cancel)
-gnoma --resume <id>         # restore directly by ID
-gnoma -r                    # shorthand
-```
-
-Inside the TUI:
-
-```
-/resume                     # open picker
-/resume <id>                # restore by ID
-```
-
-### Incognito mode
-
-```sh
-gnoma --incognito           # no session saved, no quality scores updated
-```
-
-Toggle at runtime with `Ctrl+X`.
-
-### Config
-
-```toml
-[session]
-max_keep = 20   # how many sessions to retain per project (default: 20)
-```
-
-Sessions are stored per-project under `.gnoma/sessions/<id>/`. Quality scores (EMA routing data) are stored globally at `~/.config/gnoma/quality.json`.
-
---
-
 ## Config

-Config is read in priority order:
+Configuration merges (lowest → highest priority):

-1. `~/.config/gnoma/config.toml` — global
-2. `.gnoma/config.toml` — project-local (next to `go.mod` / `.git`)
-3. Environment variables
+1. Built-in defaults
+2. `~/.config/gnoma/config.toml` — global base
+3. `~/.config/gnoma/profiles/<name>.toml` — active profile (when profile mode is enabled)
+4. `<projectRoot>/.gnoma/config.toml` — project override
+5. Environment variables (`GNOMA_PROVIDER`, `GNOMA_MODEL`, `*_API_KEY`)

-Example `.gnoma/config.toml`:
+Example global config:

 ```toml
 [provider]
@@ -243,21 +162,165 @@ ollama   = "http://localhost:11434/v1"
 llamacpp = "http://localhost:8080/v1"

 [permission]
-mode = "auto"   # auto | accept_edits | bypass | deny | plan
+mode = "auto"      # default | accept_edits | bypass | deny | plan | auto
+
+[session]
+max_keep = 20      # sessions retained per project
 ```

-Environment variable overrides: `GNOMA_PROVIDER`, `GNOMA_MODEL`.
+### Profiles
+
+Drop multiple configs under `~/.config/gnoma/profiles/` and switch with
+`--profile <name>` or `/profile <name>`. Each profile keeps its own router
+quality data and session history. Full details: [docs/profiles.md](docs/profiles.md).

 ---

-## Testing
+## SLM (small-language-model) routing

-```sh
-make test               # unit tests
-make test-integration   # integration tests (require real API keys)
-make cover              # coverage report → coverage.html
-make lint               # golangci-lint
-make check              # fmt + vet + lint + test
+gnoma can run a tiny local model alongside the main provider to:
+
+- **Classify** each prompt (task type + complexity + tool requirement) so the
+  router picks the right arm.
+- **Execute** trivial tasks itself (knowledge questions, single file reads,
+  anything with complexity ≤ 0.3), keeping the heavy provider for real work.
+
+```toml
+[slm]
+enabled = true
+backend = "auto"           # ollama | llamacpp | llamafile | openaicompat | auto | disabled
+model   = "reecdev/tiny3.5:500m"
 ```

-Integration tests are gated behind `//go:build integration` and skipped by default.
+Setup, presets, and verification: [docs/slm-backends.md](docs/slm-backends.md).
+The `auto` backend probes Ollama → llama.cpp → llamafile on startup and picks
+the first reachable option. Inspect with `gnoma slm status` and
+`gnoma router stats`.
+
+---
+
+## Session persistence
+
+Sessions are auto-saved per project under `.gnoma/sessions/<id>/` after each
+completed turn. On a crash you lose at most the current in-flight turn.
+
+```sh
+gnoma --resume              # interactive picker
+gnoma --resume <id>         # restore by ID
+gnoma -r                    # shorthand
+gnoma --incognito           # no save, no router learning
+```
+
+Inside the TUI: `/resume`, `/resume <id>`, `Ctrl+X` (incognito toggle).
+
+Router-quality data (EMA scores) is stored at
+`~/.config/gnoma/quality.json` (or `quality-<profile>.json` in profile mode).
+
+---
+
+## Extensibility
+
+### MCP servers
+
+Connect any [MCP](https://modelcontextprotocol.io)-compatible server:
+
+```toml
+[[mcp_servers]]
+name    = "git"
+command = "mcp-server-git"
+args    = ["--repo", "."]
+timeout = "30s"
+
+# Optionally replace a built-in tool with an MCP one
+[mcp_servers.replace_default]
+exec = "bash"
+```
+
+MCP tools appear as `mcp__{server}__{tool}` unless mapped via `replace_default`.
+
+### Skills
+
+Drop markdown files into `.gnoma/skills/` or `~/.config/gnoma/skills/`. Invoke
+with `/<skill-name>`. List with `/skills`.
+
+### Hooks
+
+Shell commands run on tool events (`pre_tool_use`, `post_tool_use`, etc.):
+
+```toml
+[[hooks]]
+name         = "block-rm-rf"
+event        = "pre_tool_use"
+type         = "command"
+exec         = "bash-safety-check.sh"
+tool_pattern = "bash*"
+```
+
+Ordering rules: [ADR-004](docs/essentials/decisions/004-posttooluse-hook-ordering.md).
+
+### Plugins
+
+Plugins bundle skills, hooks, and MCP server configs. Drop a plugin directory
+into `~/.config/gnoma/plugins/` (global) or `<project>/.gnoma/plugins/`
+(project-local); gnoma auto-discovers them on startup.
+
+Each plugin's `plugin.json` is pinned by SHA-256 on first load
+(Trust-On-First-Use). A manifest that changes between runs is refused with a
+clear error and a re-enrolment hint. Full model:
+[docs/plugins-trust.md](docs/plugins-trust.md) and
+[ADR-003](docs/essentials/decisions/003-plugin-trust.md).
+
+### Elfs (sub-agents)
+
+The `spawn_elfs` tool decomposes work into parallel sub-tasks. See
+[`internal/skill/skills/batch.md`](internal/skill/skills/batch.md) for the
+built-in batching skill.
+
+---
+
+## Subcommands
+
+| Command | What it does |
+|---|---|
+| `gnoma providers` | List every discovered provider, model, and CLI agent |
+| `gnoma profile list` / `show <name>` | Profile diagnostics |
+| `gnoma router stats` | Quality EMA + classifier source breakdown |
+| `gnoma slm setup` / `slm status` | Manage the llamafile-backed SLM |
+
+`gnoma --help` for the full flag set.
+
+---
+
+## Security
+
+gnoma runs tools and shell commands on your behalf. The
+[`internal/security`](internal/security) package canonicalises every path
+(TOCTOU-safe), gates network access through a configurable firewall, and
+scans tool output for secrets before it ever reaches the model. The
+`SafeProvider` boundary keeps incognito-mode data out of long-lived stores.
+
+Architecture references:
+
+- [docs/essentials/INDEX.md](docs/essentials/INDEX.md) — full architecture map
+- [docs/essentials/decisions/](docs/essentials/decisions/) — ADRs 001–004
+
+---
+
+## Development
+
+```sh
+make build          # ./bin/gnoma
+make test           # unit tests
+make test-integration  # //go:build integration — requires real API keys
+make cover          # coverage.html
+make lint           # golangci-lint
+make check          # fmt + vet + lint + test
+```
+
+Architecture, conventions, and TDD workflow: [CONTRIBUTING.md](CONTRIBUTING.md).
+
+---
+
+## License
+
+Apache License 2.0. See [LICENSE](LICENSE) and [NOTICE](NOTICE).
@@ -1,51 +1,53 @@
 # Gnoma — TODO

-Active plans, newest first:
+Active work, newest first.

- **Post-audit security hardening** — **complete (2026-05-19)**. All 14
-  findings from the external review are closed across three waves +
-  one ADR:
+## In flight
+
+- **Distribution** — `.goreleaser.yml` is configured for
+  `linux`/`darwin`/`windows` × `amd64`/`arm64`. Still pending: first
+  tag + release pipeline trigger, optional Homebrew tap and Docker
+  image, mirror release publishing to GitHub.
+- **Compound tools (post-SLM Phase E)** — held until ≥50 SLM
+  observations inform which primitives are worth adding. See
+  [`docs/superpowers/plans/2026-05-19-post-slm-unlock.md`](docs/superpowers/plans/2026-05-19-post-slm-unlock.md).
+
+## Stable backlog (not in active phases)
+
+- **Thinking mode** (disabled / budget / adaptive) — M12.
+- **Structured output** with JSON schema validation — M12.
+- **Native agy JSON output** — switch the subprocess provider to
+  `--output-format stream-json` once the agy CLI supports it,
+  replacing the current prompt-augmentation fallback.
+- **SQLite session persistence** + serve mode — M10.
+- **Task learning** (pattern recognition, persistent tasks) — M11.
+- **Web UI** (`gnoma web`) — M15.
+- **OAuth / keyring** — M13.
+- **Observability** (feature flags, cost dashboards) — M14.
+- **PE / Mach-O ELF support** — future, after ELF Phase 6.
+
+## History
+
+Completed initiatives, kept here as pointers to their plan files:
+
+- **Post-audit security hardening** — complete 2026-05-19. Three waves
+  + one ADR closed all 14 findings from the external review:
  - [Wave 1 — SafeProvider boundary](docs/superpowers/plans/2026-05-19-security-wave1-safeprovider.md)
  - [Wave 2 — Incognito coherence](docs/superpowers/plans/2026-05-19-security-wave2-incognito.md)
-  - [Wave 3 — Scanner + path hygiene](docs/superpowers/plans/2026-05-19-security-wave3-scanner-paths.md)
+  - Wave 3 — scanner + path hygiene (rolled out directly without a
+    plan file; see commits leading up to 2026-05-19 on `internal/security`)
  - [ADR-004 — PostToolUse hook ordering](docs/essentials/decisions/004-posttooluse-hook-ordering.md)
- **[`docs/superpowers/plans/2026-05-19-post-slm-unlock.md`](docs/superpowers/plans/2026-05-19-post-slm-unlock.md)**
-  — outstanding work after the SLM unlock session. Phases A (two-stage
-  tool routing), B (CLI agent binary override), C (user profiles), and
-  D (per-arm capability tags) are **complete**. Phase E (compound
-  tools) is held until ≥50 SLM observations inform which primitives are
-  worth adding.
- **[`docs/superpowers/plans/2026-05-07-gnoma-roadmap.md`](docs/superpowers/plans/2026-05-07-gnoma-roadmap.md)**
-  — broader roadmap (PTY shell, USP integration, ELF, distribution).
-  Phase 4 ("Router Revisit") is superseded by the post-SLM plan above.
+- **Post-SLM unlock** —
+  [plan](docs/superpowers/plans/2026-05-19-post-slm-unlock.md). Phases
+  A–D complete (two-stage tool routing, CLI agent binary override,
+  user profiles, per-arm capability tags).
+- **2026-05-07 roadmap** —
+  [plan](docs/superpowers/plans/2026-05-07-gnoma-roadmap.md). M1–M8
+  done; SLM classifier (Phase 3) complete; Phase 4 superseded by the
+  post-SLM plan.

-Phases (2026-05-07 roadmap):
-1. M8 Cleanup (wiring gaps)
-2. PTY Interactive Shell (`tea.ExecProcess`)
-3. SLM Task Classifier (Ollama HTTP, opt-in) — **complete**
-4. Router Revisit — **superseded by post-SLM plan**
-5. USP Security Integration
-6. ELF Binary Support (deferred/opportunistic)
-7. Distribution (CI trigger for goreleaser)
-
---
-
-## Stable Backlog (not in active phases)
-
- **Thinking mode** (disabled / budget / adaptive) — M12 in milestones
- **Structured output** with JSON schema validation — M12
- **Native agy JSON output** — update subprocess provider to use `--output-format stream-json` once supported by agy CLI, replacing the current prompt-augmentation fallback.
- **SQLite session persistence** + serve mode — M10
- **Task learning** (pattern recognition, persistent tasks) — M11
- **Web UI** (`gnoma web`) — M15
- **OAuth / keyring** — M13
- **Observability** (feature flags, cost dashboards) — M14
- **PE / Mach-O support** — future, after ELF Phase 6
-
---
-
-## Architecture References
+## Reference

 - Milestones: `docs/essentials/milestones.md`
 - Decisions: `docs/essentials/decisions/`
- ADR-013 (SLM routing, supersedes ADR-009): `docs/essentials/decisions/002-slm-routing.md`
+- ADR-002 (SLM routing, supersedes earlier ADR-009): `docs/essentials/decisions/002-slm-routing.md`
@@ -39,3 +39,4 @@ essentials:
 - [ADR-001 — Initial Decisions](decisions/001-initial-decisions.md)
 - [ADR-002 — SLM Routing](decisions/002-slm-routing.md)
 - [ADR-003 — Plugin Trust via TOFU Manifest Pinning](decisions/003-plugin-trust.md)
+- [ADR-004 — PostToolUse Hook Ordering](decisions/004-posttooluse-hook-ordering.md)
@@ -1,160 +0,0 @@
-> **Note (2026-05-07):** This document describes the `gemini-cli` (Node.js) implementation.
-> The specifics — LiteRT-LM runtime, daemon/PID management, `litert-lm pull`, React/Ink UI —
-> are Node.js artifacts and do not apply to gnoma. The **conceptually relevant part** is the
-> Complexity Rubric and the `GemmaClassifierStrategy` JSON interface, which informed the Go
-> `SLMClassifier` design in Phase 3 of `docs/superpowers/plans/2026-05-07-gnoma-roadmap.md`.
-> For the Go implementation, see ADR-013 (`docs/essentials/decisions/002-slm-routing.md`).
-
-# Gemini CLI Local Model Routing (/gemma) Architecture
-
-The `/gemma` integration in the `gemini-cli` uses a local LLM to perform "Model Routing". It automatically decides whether to use a cheaper/faster model (Flash) or a more powerful one (Pro) based on the user's request.
-
-## Core Architecture
-*   **Engine:** Uses **LiteRT-LM**, a lightweight runtime that serves Gemma models via a Gemini-compatible HTTP API.
-*   **Model:** Specifically uses a quantized **Gemma 3 1B** model (`gemma3-1b-gpu-custom`). It's ~1GB and runs locally with low latency (~100-200ms for classification).
-*   **Orchestration:** The CLI manages the LiteRT server as a background daemon, tracking its state via PID files and logs.
-*   **Integration:** A `GemmaClassifierStrategy` is injected into the core `ModelRouterService`. It flattens recent chat history, sends it to the local Gemma model with a strict "Complexity Rubric," and uses the JSON response to switch models dynamically.
-
---
-
-## Integration Todo List
-
-### 1. Infrastructure & Asset Management
- [ ] **Platform Detection:** Logic to map OS/Arch to the correct LiteRT-LM binary download URL.
- [ ] **Safe Installer:** Implementation of binary download + SHA256 checksum verification + permission handling (`chmod +x`, macOS quarantine removal).
- [ ] **Model Manager:** Wrapper for the `litert-lm pull` command to download and verify the 1GB Gemma model.
-
-### 2. Process & Server Management
- [ ] **Background Daemon:** Implementation of `spawn(..., { detached: true })` to keep the LiteRT server running independently of the CLI session.
- [ ] **State Tracking:** A PID-file system to manage server lifecycle (start/stop/status) and prevent port collisions.
- [ ] **Auto-Start Logic:** A manager class (`LiteRtServerManager`) that checks server health on CLI startup and launches it if enabled in settings.
-
-### 3. Routing Logic (The "Brain")
- [ ] **Complexity Rubric:** A specialized system prompt that defines what constitutes a "SIMPLE" vs "COMPLEX" task.
- [ ] **Context Flattener:** Utility to compress the last ~4-20 turns of chat history into a prompt suitable for a small 1B model.
- [ ] **Strategy Implementation:** The `GemmaClassifierStrategy` class to handle the local API call, parse the JSON "reasoning," and return the model decision.
-
-### 4. User Experience (CLI & UI)
- [ ] **Management Commands:** Commands like `gemini gemma {setup|start|stop|status|logs}` for lifecycle and troubleshooting.
- [ ] **Slash Command:** A built-in `/gemma` command that queries the local server health and displays a status panel inside a session.
- [ ] **React/Ink UI:** A status component to show visual indicators (green/red) for the binary, model, and server state.
-
-### 5. Configuration & Safety
- [ ] **Scoped Settings:** Separate "User" settings (binary path) from "Workspace" settings (router enabled/disabled for a specific project).
- [ ] **Failure Resilience:** Logic to gracefully fall back to the default model if the local classifier times out or fails.
-
---
-
-## Routing Prompts
-
-These are the exact prompts used by the `gemini-cli` to force the small 1B model to output structured JSON with strict reasoning criteria.
-
-### 1. The Complexity Rubric
-```markdown
-### Complexity Rubric
-A task is COMPLEX (Choose \`pro\`) if it meets ONE OR MORE of the following criteria:
-1.  **High Operational Complexity (Est. 4+ Steps/Tool Calls):** Requires dependent actions, significant planning, or multiple coordinated changes.
-2.  **Strategic Planning & Conceptual Design:** Asking "how" or "why." Requires advice, architecture, or high-level strategy.
-3.  **High Ambiguity or Large Scope (Extensive Investigation):** Broadly defined requests requiring extensive investigation.
-4.  **Deep Debugging & Root Cause Analysis:** Diagnosing unknown or complex problems from symptoms.
-A task is SIMPLE (Choose \`flash\`) if it is highly specific, bounded, and has Low Operational Complexity (Est. 1-3 tool calls). Operational simplicity overrides strategic phrasing.
-```
-
-### 2. Output Format Enforcement
-```markdown
-### Output Format
-Respond *only* in JSON format like this:
-{
-  "reasoning": Your reasoning...
-  "model_choice": Either flash or pro
-}
-And you must follow the following JSON schema:
-{
-  "type": "object",
-  "properties": {
-    "reasoning": {
-      "type": "string",
-      "description": "A brief summary of the user objective, followed by a step-by-step explanation for the model choice, referencing the rubric."
-    },
-    "model_choice": {
-      "type": "string",
-      "enum": ["flash", "pro"]
-    }
-  },
-  "required": ["reasoning", "model_choice"]
-}
-You must ensure that your reasoning is no more than 2 sentences long and directly references the rubric criteria.
-When making your decision, the user's request should be weighted much more heavily than the surrounding context when making your determination.
-```
-
-### 3. The Main System Prompt
-```markdown
-### Role
-You are the **Lead Orchestrator** for an AI system. You do not talk to users. Your sole responsibility is to analyze the **Chat History** and delegate the **Current Request** to the most appropriate **Model** based on the request's complexity.
-
-### Models
-Choose between \`flash\` (SIMPLE) or \`pro\` (COMPLEX).
-1.  \`flash\`: A fast, efficient model for simple, well-defined tasks.
-2.  \`pro\`: A powerful, advanced model for complex, open-ended, or multi-step tasks.
-
-[... Injects COMPLEXITY_RUBRIC here ...]
-
-[... Injects OUTPUT_FORMAT here ...]
-
-### Examples
-**Example 1 (Strategic Planning):**
-*User Prompt:* "How should I architect the data pipeline for this new analytics service?"
-*Your JSON Output:*
-{
-  "reasoning": "The user is asking for high-level architectural design and strategy. This falls under 'Strategic Planning & Conceptual Design'.",
-  "model_choice": "pro"
-}
-**Example 2 (Simple Tool Use):**
-*User Prompt:* "list the files in the current directory"
-*Your JSON Output:*
-{
-  "reasoning": "This is a direct command requiring a single tool call (ls). It has Low Operational Complexity (1 step).",
-  "model_choice": "flash"
-}
-**Example 3 (High Operational Complexity):**
-*User Prompt:* "I need to add a new 'email' field to the User schema in 'src/models/user.ts', migrate the database, and update the registration endpoint."
-*Your JSON Output:*
-{
-  "reasoning": "This request involves multiple coordinated steps across different files and systems. This meets the criteria for High Operational Complexity (4+ steps).",
-  "model_choice": "pro"
-}
-**Example 4 (Simple Read):**
-*User Prompt:* "Read the contents of 'package.json'."
-*Your JSON Output:*
-{
-  "reasoning": "This is a direct command requiring a single read. It has Low Operational Complexity (1 step).",
-  "model_choice": "flash"
-}
-**Example 5 (Deep Debugging):**
-*User Prompt:* "I'm getting an error 'Cannot read property 'map' of undefined' when I click the save button. Can you fix it?"
-*Your JSON Output:*
-{
-  "reasoning": "The user is reporting an error symptom without a known cause. This requires investigation and falls under 'Deep Debugging'.",
-  "model_choice": "pro"
-}
-**Example 6 (Simple Edit despite Phrasing):**
-*User Prompt:* "What is the best way to rename the variable 'data' to 'userData' in 'src/utils.js'?"
-*Your JSON Output:*
-{
-  "reasoning": "Although the user uses strategic language ('best way'), the underlying task is a localized edit. The operational complexity is low (1-2 steps).",
-  "model_choice": "flash"
-}
-```
-
-### 4. The Per-Request Prompt Structure
-For every routing decision, the CLI flattens the last ~4 turns of chat history and appends the new user request.
-
-```markdown
-You are provided with a **Chat History** and the user's **Current Request** below.
-
-#### Chat History:
-[... Flattened text of the last 4 turns, excluding tool calls ...]
-
-#### Current Request:
-"[... The actual text of what the user just typed ...]"
-```