docs: refresh README/CONTRIBUTING/AGENTS/TODO, add LICENSE, drop obsolete files

Top-level docs were stale and the .gitea/ issue templates referenced a
workflow that is no longer in use.

- README: rewrite around the current feature set (SLM routing, profiles,
  plugin TOFU, SafeProvider boundary, current model defaults). Add a
  pre-built-binary install section plus Docker (ghcr.io) install path
  for users without a Go toolchain. Document the GitHub mirror.
- CONTRIBUTING: drop the dead issue-template reference, note Gitea
  upstream + GitHub mirror split, expand the package map and test-target
  table.
- AGENTS: rebuild as a domain glossary (Elf / Arm / Turn / SafeProvider /
  Incognito / Profile) plus non-obvious conventions an outside agent
  needs and would not infer from the code.
- TODO: trim completed waves into a History section, fix a broken
  link to the never-written Wave 3 plan file, surface active backlog.
- docs/essentials/INDEX: add ADR-004 (PostToolUse hook ordering) to the
  ADR list.
- LICENSE + NOTICE: adopt Apache License 2.0. Patent grant matters
  because gnoma bundles SDKs from Anthropic / OpenAI / Google / Mistral
  and ships derivative tooling that runs untrusted MCP servers.
- Delete .gitea/issue_template/ and gemma-integration-analysis.md
  (latter is obsolete per its own preamble — Node.js-specific notes
  that don't apply to the Go implementation).
This commit is contained in:
2026-05-20 03:13:40 +02:00
parent 99fa0ff08e
commit 5170c73dac
10 changed files with 640 additions and 548 deletions
-58
View File
@@ -1,58 +0,0 @@
name: Bug Report
about: Report something that isn't working correctly
labels:
- bug
body:
- type: textarea
id: description
attributes:
label: Description
description: What happened? What did you expect?
validations:
required: true
- type: textarea
id: reproduction
attributes:
label: Steps to reproduce
description: Minimal steps to trigger the issue
placeholder: |
1. Run `gnoma --provider anthropic`
2. Type "..."
3. See error
validations:
required: true
- type: input
id: version
attributes:
label: gnoma version
description: Output of `gnoma --version`
placeholder: "gnoma 0.1.0 (abc1234, 2026-04-12)"
validations:
required: true
- type: input
id: os
attributes:
label: OS / Architecture
placeholder: "Linux x86_64 / macOS arm64 / Windows amd64"
validations:
required: true
- type: dropdown
id: provider
attributes:
label: Provider
options:
- mistral
- anthropic
- openai
- google
- ollama
- llamacpp
- N/A
validations:
required: false
- type: textarea
id: logs
attributes:
label: Relevant logs
description: Run with `--verbose` for debug output
render: shell
@@ -1,42 +0,0 @@
name: Feature Request
about: Suggest an improvement or new capability
labels:
- enhancement
body:
- type: textarea
id: problem
attributes:
label: Problem
description: What are you trying to do that gnoma doesn't support well?
validations:
required: true
- type: textarea
id: solution
attributes:
label: Proposed solution
description: How would you like this to work?
validations:
required: true
- type: textarea
id: alternatives
attributes:
label: Alternatives considered
description: Other approaches you've thought about
validations:
required: false
- type: dropdown
id: area
attributes:
label: Area
options:
- providers
- tools
- router
- TUI
- MCP / plugins
- elfs (sub-agents)
- security
- config
- other
validations:
required: false
+70 -21
View File
@@ -1,26 +1,75 @@
# AGENTS.md
## Domain Terminology
- **Elf**: An agent instance.
- **Turn**: A complete sequence of agentic reasoning and tool execution.
- **Routing Arm**: A specific model/provider selected by the `Router` for a task.
- **Stream Event**: Discrete updates during LLM generation (e.g., `EventTextDelta`, `EventToolCallStart`, `EventToolResult`).
Conventions for AI assistants working in this repository. CLAUDE.md
covers Go style, commits, and TDD policy; this file adds gnoma-specific
domain knowledge those rules do not capture.
## Build & Test Targets
- **Run**: `make run`
- **Test (Verbose)**: `make test-v`
- **Integration Tests**: `make test-integration` (requires `//go:build integration`)
## Domain glossary
## Key Dependencies
- **Mistral**: `github.com/VikingOwl91/mistral-go-sdk`
- **Anthropic**: `github.com/anthropics/anthropic-sdk-go`
- **OpenAI**: `github.com/openai/openai-go`
- **Google GenAI**: `google.golang.org/genai`
- **TUI**: `charm.land/bubbletea/v2`, `charm.land/lipgloss/v2`
- **Other**: `charm.land/bubbles/v2`, `charm.land/glamour/v2`, `github.com/pkoukk/tiktoken-go`
| Term | Meaning |
|---|---|
| **Elf** | A sub-agent instance, spawned via `spawn_elfs`. |
| **Turn** | One complete `stream → tool → re-query` cycle in the engine. |
| **Arm** | A `(provider, model)` pair the router can select. Registered with cost and capability metadata. |
| **Router** | Multi-armed-bandit selector that picks an Arm per Turn from the registered set. |
| **SLM** | Small language model running locally for prompt classification and trivial-task execution. |
| **Stream Event** | Discriminated-union update emitted while a provider streams: `EventTextDelta`, `EventToolCallStart`, `EventToolResult`, etc. See `internal/stream/event.go`. |
| **SafeProvider** | The sealed boundary that gates outbound provider calls — every Provider implementation embeds the unexported marker. See `internal/security`. |
| **Incognito** | Per-turn mode that disables session persistence and router learning. |
| **Profile** | A named config overlay under `~/.config/gnoma/profiles/`. Switches keys, models, and per-profile router quality data. |
## Environment Variables
- `MISTRAL_API_KEY`: Required for Mistral provider.
- `ANTHROPIC_API_KEY`: Required for Anthropic provider.
- `OPENAI_API_KEY`: Required for OpenAI provider.
- `GOOGLE_API_KEY`: Required for Google provider.
## Build & test targets (beyond standard)
| Target | Purpose |
|---|---|
| `make test-v` | Verbose unit tests |
| `make test-integration` | Runs `//go:build integration` tests (real API calls) |
| `make check` | fmt + vet + lint + test (use before committing) |
| `go test -bench=. ./internal/router/` | Router benchmarks |
## Provider env vars
| Provider | Primary | Alternative |
|---|---|---|
| Anthropic | `ANTHROPIC_API_KEY` | `ANTHROPICS_API_KEY` |
| OpenAI | `OPENAI_API_KEY` | — |
| Google | `GEMINI_API_KEY` | `GOOGLE_API_KEY` |
| Mistral | `MISTRAL_API_KEY` | — |
`GNOMA_PROVIDER` and `GNOMA_MODEL` override the resolved config.
## Non-obvious conventions
- **Discriminated unions** are structs with a `Type` field and pointer
payloads — not Go interfaces. See `internal/stream/event.go` and
`internal/message`.
- **Pull-based iterators** follow the `Next() / Current() / Err() / Close()`
shape. Streams in `internal/provider/*/stream.go` are the canonical examples.
- **`json.RawMessage`** flows through `tool.Definition.Parameters` and tool
arguments untouched — never marshal/unmarshal in the middle.
- **Capabilities and ContextWindow** come from `internal/provider`
`inferXxxModelCapabilities` per provider; updating model lists also updates
these tables and the `ratelimits.go` map.
- **Hook ordering** matters for `PostToolUse`. See ADR-004.
- **Plugin trust** is TOFU pinning — see `internal/plugin/pinstore.go` and
ADR-003.
## Sub-agent (elf) etiquette
When spawning elfs:
- One `spawn_elfs` call for all parallel work; never spawn one at a time.
- Read-only tasks on disjoint files parallelize cleanly.
- Writes to the same file must be sequenced into one elf.
- Cap each batch at 57 elfs.
See `internal/skill/skills/batch.md` for the canonical batching template.
## Reference docs
- Architecture map: `docs/essentials/INDEX.md`
- ADRs: `docs/essentials/decisions/`
- Profiles: `docs/profiles.md`
- SLM backends: `docs/slm-backends.md`
- Plugin trust: `docs/plugins-trust.md`
- Router benchmarks: `docs/benchmarks/README.md`
+49 -19
View File
@@ -1,5 +1,10 @@
# Contributing to gnoma
The upstream repository lives at
<https://somegit.dev/Owlibou/gnoma> and is mirrored to
<https://github.com/VikingOwl91/gnoma>. PRs are accepted on the upstream
(Gitea) instance; the GitHub mirror is read-only.
## Setup
```sh
@@ -11,34 +16,43 @@ make lint # requires golangci-lint
## Development workflow
1. Create a branch from `main`
2. Write tests first (TDD) — table-driven, `t.TempDir()` for filesystem tests
3. `make check` (fmt + vet + lint + test) must pass
4. Commit with conventional messages: `feat:`, `fix:`, `refactor:`, `test:`, `docs:`
1. Branch from `main`.
2. Write tests first (TDD). Table-driven where possible, `t.TempDir()` for
filesystem tests, `testing/synctest` for concurrent ones.
3. `make check` (fmt + vet + lint + test) must pass.
4. Conventional commits: `feat:`, `fix:`, `refactor:`, `test:`, `docs:`,
`chore:`. **No co-signing or "Generated-by" trailers.**
## Code style
- Go 1.26 idioms (`new(expr)`, `errors.AsType[E]`)
- Structured logging with `log/slog`
- `json.RawMessage` for tool schemas (zero-cost passthrough)
- Functional options for complex configuration
- Short, lowercase package names — no underscores
- Go 1.26 idioms (`new(expr)`, `errors.AsType[E]`, `sync.WaitGroup.Go`).
- Structured logging with `log/slog`.
- `json.RawMessage` for tool schemas (zero-cost passthrough).
- Functional options for complex configuration.
- Short, lowercase package names — no underscores.
- Discriminated unions via struct + type discriminant, not interfaces.
- Pull-based stream iterators: `Next() / Current() / Err() / Close()`.
## Testing
- Unit tests: `make test`
- Integration tests (require API keys): `make test-integration`
- Coverage: `make cover`
- Benchmarks: `go test -bench=. ./internal/router/`
| Command | What it runs |
|---|---|
| `make test` | unit tests |
| `make test-integration` | tests behind `//go:build integration` — requires real API keys |
| `make cover` | coverage → `coverage.html` |
| `make lint` | `golangci-lint run ./...` |
| `make check` | fmt + vet + lint + test |
| `go test -bench=. ./internal/router/` | router benchmarks |
Integration tests use `//go:build integration` and are skipped by default.
Integration tests are skipped by default.
## Architecture
Read `docs/essentials/INDEX.md` before making architectural changes. Key packages:
Read [`docs/essentials/INDEX.md`](docs/essentials/INDEX.md) before changing
architectural boundaries. Key packages:
| Package | Purpose |
|---------|---------|
|---|---|
| `internal/engine` | Agentic loop (stream → tool → re-query) |
| `internal/router` | Multi-armed bandit arm selection |
| `internal/provider` | LLM provider adapters |
@@ -46,8 +60,24 @@ Read `docs/essentials/INDEX.md` before making architectural changes. Key package
| `internal/mcp` | MCP client (JSON-RPC over stdio) |
| `internal/plugin` | Plugin manifest, loader, manager |
| `internal/elf` | Sub-agent (elf) system |
| `internal/tui` | Bubble Tea terminal UI |
| `internal/security` | SafeProvider boundary, firewall, output scanner |
| `internal/skill` | Skill registry and templating |
| `internal/slm` | Small-language-model classifier + arm |
| `internal/tui` | Bubble Tea v2 terminal UI |
## Issues
ADRs live in [`docs/essentials/decisions/`](docs/essentials/decisions/).
Use the issue templates when filing bugs or requesting features. Include reproduction steps, expected behavior, and gnoma version (`gnoma --version`).
## Reporting issues
File issues on the upstream Gitea instance with:
- A short reproduction (commands, prompts, configs that triggered the bug).
- Expected vs. actual behavior.
- `gnoma --version` output and OS / architecture.
- Provider and model in use, if relevant.
- `--verbose` log output if it sheds light.
## License
By contributing you agree your work is licensed under the
[Apache License 2.0](LICENSE).
+202
View File
@@ -0,0 +1,202 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
+5
View File
@@ -0,0 +1,5 @@
gnoma
Copyright 2026 vikingowl
This product includes software developed at the gnoma project
(https://somegit.dev/Owlibou/gnoma).
+270 -207
View File
@@ -1,234 +1,153 @@
# gnoma
**A provider-agnostic agentic coding assistant built in Go.** gnoma routes tasks to the best available LLM — cloud or local — through a multi-armed bandit router, while tools, hooks, skills, MCP servers, and plugins keep it extensible. Named after the northern pygmy-owl (*Glaucidium gnoma*); agents are called **elfs** (elf owl).
**A provider-agnostic agentic coding assistant in Go.** gnoma routes each prompt
to the best available model — cloud or local — through a multi-armed bandit
router, executes tools on your behalf, and stays extensible through hooks,
skills, MCP servers, and plugins.
Named after the northern pygmy-owl (*Glaucidium gnoma*); agents are called
**elfs** (elf owl).
- **Upstream:** <https://somegit.dev/Owlibou/gnoma>
- **GitHub mirror:** <https://github.com/VikingOwl91/gnoma>
---
## Install
### Pre-built binary (no Go toolchain required)
Releases are built by [GoReleaser](.goreleaser.yml) for
`linux`, `darwin`, and `windows` × `amd64`/`arm64` as static (`CGO_ENABLED=0`)
archives. Until the first tag is cut, see "Build from source" below.
Once releases are published:
```sh
# Pick the archive matching your OS/arch from the releases page:
# https://somegit.dev/Owlibou/gnoma/releases (upstream)
# https://github.com/VikingOwl91/gnoma/releases (mirror)
# Linux/macOS one-liner (substitute the asset URL):
curl -fsSL <ARCHIVE_URL> | tar -xz -C /tmp
sudo mv /tmp/gnoma /usr/local/bin/
gnoma --version
```
Windows: download the `_windows_*.zip`, extract `gnoma.exe`, and put it on
`%PATH%`.
### Docker
Multi-arch images (`linux/amd64`, `linux/arm64`) are published to GitHub
Container Registry on each tagged release:
```sh
docker pull ghcr.io/vikingowl91/gnoma:latest
docker run --rm -it \
-v "$PWD:/workspace" \
-e ANTHROPIC_API_KEY \
ghcr.io/vikingowl91/gnoma:latest --version
```
Mount your project as `/workspace` (the image's working directory) and pass
provider keys via `-e`.
### Go users
```sh
go install somegit.dev/Owlibou/gnoma/cmd/gnoma@latest # latest tagged
go install somegit.dev/Owlibou/gnoma/cmd/gnoma@main # bleeding edge
```
### Build from source
```sh
git clone https://somegit.dev/Owlibou/gnoma && cd gnoma
make build # → ./bin/gnoma
make install # → $GOPATH/bin/gnoma
```
Requires Go 1.26+.
---
## Quickstart
```sh
# Install
go install somegit.dev/Owlibou/gnoma/cmd/gnoma@latest
# Set at least one provider key (or run a local model — see Providers below).
export ANTHROPIC_API_KEY=sk-ant-...
# Or build from source
git clone https://somegit.dev/Owlibou/gnoma && cd gnoma
make build # binary at ./bin/gnoma
# Set at least one provider key
export ANTHROPIC_API_KEY=sk-ant-... # or OPENAI_API_KEY, MISTRAL_API_KEY, GEMINI_API_KEY
# Run
gnoma # interactive TUI
echo "list files" | gnoma # pipe mode
gnoma --provider ollama # use a local model
gnoma # interactive TUI
echo "list files" | gnoma # pipe / one-shot mode
gnoma --provider ollama # use a local model
gnoma --version
```
## Build
Inside the TUI, `Ctrl+X` toggles **incognito** (no session saved, no router
learning); `/help` lists slash commands; `Esc` cancels an in-flight turn.
```sh
make build # ./bin/gnoma
make install # $GOPATH/bin/gnoma
```
---
## Providers
### Anthropic
| Provider | Env var | Default model | Also available |
|---|---|---|---|
| Anthropic | `ANTHROPIC_API_KEY` | `claude-sonnet-4-6` | `claude-opus-4-7`, `claude-haiku-4-5-20251001` |
| OpenAI | `OPENAI_API_KEY` | `gpt-5.5` | `gpt-5.5-pro`, `gpt-5.2`, `gpt-5.2-chat-latest` |
| Google (Gemini) | `GEMINI_API_KEY` (alt: `GOOGLE_API_KEY`) | `gemini-3.5-flash` | `gemini-3.1-pro-preview`, `gemini-3.1-flash-lite` |
| Mistral | `MISTRAL_API_KEY` | `mistral-large-latest` (Mistral Large 3) | `mistral-medium-3.5`, `magistral-medium-2509` |
| Ollama (local) | — | `qwen3:8b` (override with `--model`) | any model on your Ollama instance |
| llama.cpp (local) | — | reported by `/v1/models` | n/a |
| Subprocess (`claude`, `gemini`, `agy` CLIs) | provider-specific | binary name | configurable via `[cli_agents]` |
Override per-invocation:
```sh
export ANTHROPIC_API_KEY=sk-ant-...
./bin/gnoma --provider anthropic
./bin/gnoma --provider anthropic --model claude-opus-4-5-20251001
gnoma --provider anthropic --model claude-opus-4-7
gnoma --provider openai --model gpt-5.5-pro # GPT-5.5 is the default; pro is the higher-accuracy tier
gnoma --provider google --model gemini-3.1-pro-preview
gnoma --provider ollama --model qwen2.5-coder:3b
gnoma --provider llamacpp # model picked from server
```
Integration tests hit the real API — keep a key in env:
`gnoma providers` prints every discovered provider, model, and CLI agent.
### Local models
Start your local server, then point gnoma at it:
```sh
go test -tags integration ./internal/provider/...
```
# Ollama (default http://localhost:11434/v1)
ollama pull qwen2.5-coder:3b
gnoma --provider ollama --model qwen2.5-coder:3b
---
### OpenAI
```sh
export OPENAI_API_KEY=sk-proj-...
./bin/gnoma --provider openai
./bin/gnoma --provider openai --model gpt-4o
```
---
### Mistral
```sh
export MISTRAL_API_KEY=...
./bin/gnoma --provider mistral
```
---
### Google (Gemini)
```sh
export GEMINI_API_KEY=AIza...
./bin/gnoma --provider google
./bin/gnoma --provider google --model gemini-2.0-flash
```
---
### Ollama (local)
Start Ollama and pull a model, then:
```sh
./bin/gnoma --provider ollama --model gemma4:latest
./bin/gnoma --provider ollama --model qwen3:8b # default if --model omitted
```
Default endpoint: `http://localhost:11434/v1`. Override via config or env:
```sh
# .gnoma/config.toml
[provider]
default = "ollama"
model = "gemma4:latest"
[provider.endpoints]
ollama = "http://myhost:11434/v1"
```
---
### llama.cpp (local)
Start the llama.cpp server:
```sh
# llama.cpp (default http://localhost:8080/v1)
llama-server --model /path/to/model.gguf --port 8080 --ctx-size 8192
gnoma --provider llamacpp
```
Then:
Override the endpoint in `.gnoma/config.toml`:
```sh
./bin/gnoma --provider llamacpp
# model name is taken from the server's /v1/models response
```
Default endpoint: `http://localhost:8080/v1`. Override:
```sh
```toml
[provider.endpoints]
ollama = "http://myhost:11434/v1"
llamacpp = "http://localhost:9090/v1"
```
---
## Extensibility (M8)
gnoma supports hooks, skills, MCP servers, and plugins.
### MCP Servers
Connect any [MCP](https://modelcontextprotocol.io)-compatible tool server:
```toml
[[mcp_servers]]
name = "git"
command = "mcp-server-git"
args = ["--repo", "."]
timeout = "30s"
# Replace a built-in tool with an MCP tool
[mcp_servers.replace_default]
exec = "bash" # MCP tool "exec" replaces gnoma's built-in "bash"
```
MCP tools appear as `mcp__{server}__{tool}` (e.g., `mcp__git__status`), or under the built-in name when using `replace_default`.
### Skills
Drop markdown files into `.gnoma/skills/` or `~/.config/gnoma/skills/`:
```
/skillname # invoke a skill
/skills # list available skills
```
### Hooks
Run shell commands on tool events:
```toml
[[hooks]]
name = "block-rm-rf"
event = "pre_tool_use"
type = "command"
exec = "bash-safety-check.sh"
tool_pattern = "bash*"
```
### Plugins
Bundle skills, hooks, and MCP configs into installable plugins:
```sh
gnoma plugin install ./my-plugin # install from directory
gnoma plugin list # list installed plugins
```
Plugins are pinned by SHA-256 of their `plugin.json` on first load
(Trust-On-First-Use). A manifest that changes between runs is refused with a
clear error and a re-enrollment hint. See [docs/plugins-trust.md](docs/plugins-trust.md)
and [ADR-003](docs/essentials/decisions/003-plugin-trust.md).
---
## Session Persistence
Conversations are auto-saved to `.gnoma/sessions/` after each completed turn. On a crash you lose at most the current in-flight turn; all previously completed turns are safe.
### Resume a session
```sh
gnoma --resume # interactive session picker (↑↓ navigate, Enter load, Esc cancel)
gnoma --resume <id> # restore directly by ID
gnoma -r # shorthand
```
Inside the TUI:
```
/resume # open picker
/resume <id> # restore by ID
```
### Incognito mode
```sh
gnoma --incognito # no session saved, no quality scores updated
```
Toggle at runtime with `Ctrl+X`.
### Config
```toml
[session]
max_keep = 20 # how many sessions to retain per project (default: 20)
```
Sessions are stored per-project under `.gnoma/sessions/<id>/`. Quality scores (EMA routing data) are stored globally at `~/.config/gnoma/quality.json`.
---
## Config
Config is read in priority order:
Configuration merges (lowest → highest priority):
1. `~/.config/gnoma/config.toml` — global
2. `.gnoma/config.toml`project-local (next to `go.mod` / `.git`)
3. Environment variables
1. Built-in defaults
2. `~/.config/gnoma/config.toml`global base
3. `~/.config/gnoma/profiles/<name>.toml` — active profile (when profile mode is enabled)
4. `<projectRoot>/.gnoma/config.toml` — project override
5. Environment variables (`GNOMA_PROVIDER`, `GNOMA_MODEL`, `*_API_KEY`)
Example `.gnoma/config.toml`:
Example global config:
```toml
[provider]
@@ -243,21 +162,165 @@ ollama = "http://localhost:11434/v1"
llamacpp = "http://localhost:8080/v1"
[permission]
mode = "auto" # auto | accept_edits | bypass | deny | plan
mode = "auto" # default | accept_edits | bypass | deny | plan | auto
[session]
max_keep = 20 # sessions retained per project
```
Environment variable overrides: `GNOMA_PROVIDER`, `GNOMA_MODEL`.
### Profiles
Drop multiple configs under `~/.config/gnoma/profiles/` and switch with
`--profile <name>` or `/profile <name>`. Each profile keeps its own router
quality data and session history. Full details: [docs/profiles.md](docs/profiles.md).
---
## Testing
## SLM (small-language-model) routing
```sh
make test # unit tests
make test-integration # integration tests (require real API keys)
make cover # coverage report → coverage.html
make lint # golangci-lint
make check # fmt + vet + lint + test
gnoma can run a tiny local model alongside the main provider to:
- **Classify** each prompt (task type + complexity + tool requirement) so the
router picks the right arm.
- **Execute** trivial tasks itself (knowledge questions, single file reads,
anything with complexity ≤ 0.3), keeping the heavy provider for real work.
```toml
[slm]
enabled = true
backend = "auto" # ollama | llamacpp | llamafile | openaicompat | auto | disabled
model = "reecdev/tiny3.5:500m"
```
Integration tests are gated behind `//go:build integration` and skipped by default.
Setup, presets, and verification: [docs/slm-backends.md](docs/slm-backends.md).
The `auto` backend probes Ollama → llama.cpp → llamafile on startup and picks
the first reachable option. Inspect with `gnoma slm status` and
`gnoma router stats`.
---
## Session persistence
Sessions are auto-saved per project under `.gnoma/sessions/<id>/` after each
completed turn. On a crash you lose at most the current in-flight turn.
```sh
gnoma --resume # interactive picker
gnoma --resume <id> # restore by ID
gnoma -r # shorthand
gnoma --incognito # no save, no router learning
```
Inside the TUI: `/resume`, `/resume <id>`, `Ctrl+X` (incognito toggle).
Router-quality data (EMA scores) is stored at
`~/.config/gnoma/quality.json` (or `quality-<profile>.json` in profile mode).
---
## Extensibility
### MCP servers
Connect any [MCP](https://modelcontextprotocol.io)-compatible server:
```toml
[[mcp_servers]]
name = "git"
command = "mcp-server-git"
args = ["--repo", "."]
timeout = "30s"
# Optionally replace a built-in tool with an MCP one
[mcp_servers.replace_default]
exec = "bash"
```
MCP tools appear as `mcp__{server}__{tool}` unless mapped via `replace_default`.
### Skills
Drop markdown files into `.gnoma/skills/` or `~/.config/gnoma/skills/`. Invoke
with `/<skill-name>`. List with `/skills`.
### Hooks
Shell commands run on tool events (`pre_tool_use`, `post_tool_use`, etc.):
```toml
[[hooks]]
name = "block-rm-rf"
event = "pre_tool_use"
type = "command"
exec = "bash-safety-check.sh"
tool_pattern = "bash*"
```
Ordering rules: [ADR-004](docs/essentials/decisions/004-posttooluse-hook-ordering.md).
### Plugins
Plugins bundle skills, hooks, and MCP server configs. Drop a plugin directory
into `~/.config/gnoma/plugins/` (global) or `<project>/.gnoma/plugins/`
(project-local); gnoma auto-discovers them on startup.
Each plugin's `plugin.json` is pinned by SHA-256 on first load
(Trust-On-First-Use). A manifest that changes between runs is refused with a
clear error and a re-enrolment hint. Full model:
[docs/plugins-trust.md](docs/plugins-trust.md) and
[ADR-003](docs/essentials/decisions/003-plugin-trust.md).
### Elfs (sub-agents)
The `spawn_elfs` tool decomposes work into parallel sub-tasks. See
[`internal/skill/skills/batch.md`](internal/skill/skills/batch.md) for the
built-in batching skill.
---
## Subcommands
| Command | What it does |
|---|---|
| `gnoma providers` | List every discovered provider, model, and CLI agent |
| `gnoma profile list` / `show <name>` | Profile diagnostics |
| `gnoma router stats` | Quality EMA + classifier source breakdown |
| `gnoma slm setup` / `slm status` | Manage the llamafile-backed SLM |
`gnoma --help` for the full flag set.
---
## Security
gnoma runs tools and shell commands on your behalf. The
[`internal/security`](internal/security) package canonicalises every path
(TOCTOU-safe), gates network access through a configurable firewall, and
scans tool output for secrets before it ever reaches the model. The
`SafeProvider` boundary keeps incognito-mode data out of long-lived stores.
Architecture references:
- [docs/essentials/INDEX.md](docs/essentials/INDEX.md) — full architecture map
- [docs/essentials/decisions/](docs/essentials/decisions/) — ADRs 001004
---
## Development
```sh
make build # ./bin/gnoma
make test # unit tests
make test-integration # //go:build integration — requires real API keys
make cover # coverage.html
make lint # golangci-lint
make check # fmt + vet + lint + test
```
Architecture, conventions, and TDD workflow: [CONTRIBUTING.md](CONTRIBUTING.md).
---
## License
Apache License 2.0. See [LICENSE](LICENSE) and [NOTICE](NOTICE).
+43 -41
View File
@@ -1,51 +1,53 @@
# Gnoma — TODO
Active plans, newest first:
Active work, newest first.
- **Post-audit security hardening** — **complete (2026-05-19)**. All 14
findings from the external review are closed across three waves +
one ADR:
## In flight
- **Distribution** — `.goreleaser.yml` is configured for
`linux`/`darwin`/`windows` × `amd64`/`arm64`. Still pending: first
tag + release pipeline trigger, optional Homebrew tap and Docker
image, mirror release publishing to GitHub.
- **Compound tools (post-SLM Phase E)** — held until ≥50 SLM
observations inform which primitives are worth adding. See
[`docs/superpowers/plans/2026-05-19-post-slm-unlock.md`](docs/superpowers/plans/2026-05-19-post-slm-unlock.md).
## Stable backlog (not in active phases)
- **Thinking mode** (disabled / budget / adaptive) — M12.
- **Structured output** with JSON schema validation — M12.
- **Native agy JSON output** — switch the subprocess provider to
`--output-format stream-json` once the agy CLI supports it,
replacing the current prompt-augmentation fallback.
- **SQLite session persistence** + serve mode — M10.
- **Task learning** (pattern recognition, persistent tasks) — M11.
- **Web UI** (`gnoma web`) — M15.
- **OAuth / keyring** — M13.
- **Observability** (feature flags, cost dashboards) — M14.
- **PE / Mach-O ELF support** — future, after ELF Phase 6.
## History
Completed initiatives, kept here as pointers to their plan files:
- **Post-audit security hardening** — complete 2026-05-19. Three waves
+ one ADR closed all 14 findings from the external review:
- [Wave 1 — SafeProvider boundary](docs/superpowers/plans/2026-05-19-security-wave1-safeprovider.md)
- [Wave 2 — Incognito coherence](docs/superpowers/plans/2026-05-19-security-wave2-incognito.md)
- [Wave 3 — Scanner + path hygiene](docs/superpowers/plans/2026-05-19-security-wave3-scanner-paths.md)
- Wave 3 — scanner + path hygiene (rolled out directly without a
plan file; see commits leading up to 2026-05-19 on `internal/security`)
- [ADR-004 — PostToolUse hook ordering](docs/essentials/decisions/004-posttooluse-hook-ordering.md)
- **[`docs/superpowers/plans/2026-05-19-post-slm-unlock.md`](docs/superpowers/plans/2026-05-19-post-slm-unlock.md)**
— outstanding work after the SLM unlock session. Phases A (two-stage
tool routing), B (CLI agent binary override), C (user profiles), and
D (per-arm capability tags) are **complete**. Phase E (compound
tools) is held until ≥50 SLM observations inform which primitives are
worth adding.
- **[`docs/superpowers/plans/2026-05-07-gnoma-roadmap.md`](docs/superpowers/plans/2026-05-07-gnoma-roadmap.md)**
— broader roadmap (PTY shell, USP integration, ELF, distribution).
Phase 4 ("Router Revisit") is superseded by the post-SLM plan above.
- **Post-SLM unlock** —
[plan](docs/superpowers/plans/2026-05-19-post-slm-unlock.md). Phases
AD complete (two-stage tool routing, CLI agent binary override,
user profiles, per-arm capability tags).
- **2026-05-07 roadmap** —
[plan](docs/superpowers/plans/2026-05-07-gnoma-roadmap.md). M1M8
done; SLM classifier (Phase 3) complete; Phase 4 superseded by the
post-SLM plan.
Phases (2026-05-07 roadmap):
1. M8 Cleanup (wiring gaps)
2. PTY Interactive Shell (`tea.ExecProcess`)
3. SLM Task Classifier (Ollama HTTP, opt-in) — **complete**
4. Router Revisit — **superseded by post-SLM plan**
5. USP Security Integration
6. ELF Binary Support (deferred/opportunistic)
7. Distribution (CI trigger for goreleaser)
---
## Stable Backlog (not in active phases)
- **Thinking mode** (disabled / budget / adaptive) — M12 in milestones
- **Structured output** with JSON schema validation — M12
- **Native agy JSON output** — update subprocess provider to use `--output-format stream-json` once supported by agy CLI, replacing the current prompt-augmentation fallback.
- **SQLite session persistence** + serve mode — M10
- **Task learning** (pattern recognition, persistent tasks) — M11
- **Web UI** (`gnoma web`) — M15
- **OAuth / keyring** — M13
- **Observability** (feature flags, cost dashboards) — M14
- **PE / Mach-O support** — future, after ELF Phase 6
---
## Architecture References
## Reference
- Milestones: `docs/essentials/milestones.md`
- Decisions: `docs/essentials/decisions/`
- ADR-013 (SLM routing, supersedes ADR-009): `docs/essentials/decisions/002-slm-routing.md`
- ADR-002 (SLM routing, supersedes earlier ADR-009): `docs/essentials/decisions/002-slm-routing.md`
+1
View File
@@ -39,3 +39,4 @@ essentials:
- [ADR-001 — Initial Decisions](decisions/001-initial-decisions.md)
- [ADR-002 — SLM Routing](decisions/002-slm-routing.md)
- [ADR-003 — Plugin Trust via TOFU Manifest Pinning](decisions/003-plugin-trust.md)
- [ADR-004 — PostToolUse Hook Ordering](decisions/004-posttooluse-hook-ordering.md)
-160
View File
@@ -1,160 +0,0 @@
> **Note (2026-05-07):** This document describes the `gemini-cli` (Node.js) implementation.
> The specifics — LiteRT-LM runtime, daemon/PID management, `litert-lm pull`, React/Ink UI —
> are Node.js artifacts and do not apply to gnoma. The **conceptually relevant part** is the
> Complexity Rubric and the `GemmaClassifierStrategy` JSON interface, which informed the Go
> `SLMClassifier` design in Phase 3 of `docs/superpowers/plans/2026-05-07-gnoma-roadmap.md`.
> For the Go implementation, see ADR-013 (`docs/essentials/decisions/002-slm-routing.md`).
# Gemini CLI Local Model Routing (/gemma) Architecture
The `/gemma` integration in the `gemini-cli` uses a local LLM to perform "Model Routing". It automatically decides whether to use a cheaper/faster model (Flash) or a more powerful one (Pro) based on the user's request.
## Core Architecture
* **Engine:** Uses **LiteRT-LM**, a lightweight runtime that serves Gemma models via a Gemini-compatible HTTP API.
* **Model:** Specifically uses a quantized **Gemma 3 1B** model (`gemma3-1b-gpu-custom`). It's ~1GB and runs locally with low latency (~100-200ms for classification).
* **Orchestration:** The CLI manages the LiteRT server as a background daemon, tracking its state via PID files and logs.
* **Integration:** A `GemmaClassifierStrategy` is injected into the core `ModelRouterService`. It flattens recent chat history, sends it to the local Gemma model with a strict "Complexity Rubric," and uses the JSON response to switch models dynamically.
---
## Integration Todo List
### 1. Infrastructure & Asset Management
- [ ] **Platform Detection:** Logic to map OS/Arch to the correct LiteRT-LM binary download URL.
- [ ] **Safe Installer:** Implementation of binary download + SHA256 checksum verification + permission handling (`chmod +x`, macOS quarantine removal).
- [ ] **Model Manager:** Wrapper for the `litert-lm pull` command to download and verify the 1GB Gemma model.
### 2. Process & Server Management
- [ ] **Background Daemon:** Implementation of `spawn(..., { detached: true })` to keep the LiteRT server running independently of the CLI session.
- [ ] **State Tracking:** A PID-file system to manage server lifecycle (start/stop/status) and prevent port collisions.
- [ ] **Auto-Start Logic:** A manager class (`LiteRtServerManager`) that checks server health on CLI startup and launches it if enabled in settings.
### 3. Routing Logic (The "Brain")
- [ ] **Complexity Rubric:** A specialized system prompt that defines what constitutes a "SIMPLE" vs "COMPLEX" task.
- [ ] **Context Flattener:** Utility to compress the last ~4-20 turns of chat history into a prompt suitable for a small 1B model.
- [ ] **Strategy Implementation:** The `GemmaClassifierStrategy` class to handle the local API call, parse the JSON "reasoning," and return the model decision.
### 4. User Experience (CLI & UI)
- [ ] **Management Commands:** Commands like `gemini gemma {setup|start|stop|status|logs}` for lifecycle and troubleshooting.
- [ ] **Slash Command:** A built-in `/gemma` command that queries the local server health and displays a status panel inside a session.
- [ ] **React/Ink UI:** A status component to show visual indicators (green/red) for the binary, model, and server state.
### 5. Configuration & Safety
- [ ] **Scoped Settings:** Separate "User" settings (binary path) from "Workspace" settings (router enabled/disabled for a specific project).
- [ ] **Failure Resilience:** Logic to gracefully fall back to the default model if the local classifier times out or fails.
---
## Routing Prompts
These are the exact prompts used by the `gemini-cli` to force the small 1B model to output structured JSON with strict reasoning criteria.
### 1. The Complexity Rubric
```markdown
### Complexity Rubric
A task is COMPLEX (Choose \`pro\`) if it meets ONE OR MORE of the following criteria:
1. **High Operational Complexity (Est. 4+ Steps/Tool Calls):** Requires dependent actions, significant planning, or multiple coordinated changes.
2. **Strategic Planning & Conceptual Design:** Asking "how" or "why." Requires advice, architecture, or high-level strategy.
3. **High Ambiguity or Large Scope (Extensive Investigation):** Broadly defined requests requiring extensive investigation.
4. **Deep Debugging & Root Cause Analysis:** Diagnosing unknown or complex problems from symptoms.
A task is SIMPLE (Choose \`flash\`) if it is highly specific, bounded, and has Low Operational Complexity (Est. 1-3 tool calls). Operational simplicity overrides strategic phrasing.
```
### 2. Output Format Enforcement
```markdown
### Output Format
Respond *only* in JSON format like this:
{
"reasoning": Your reasoning...
"model_choice": Either flash or pro
}
And you must follow the following JSON schema:
{
"type": "object",
"properties": {
"reasoning": {
"type": "string",
"description": "A brief summary of the user objective, followed by a step-by-step explanation for the model choice, referencing the rubric."
},
"model_choice": {
"type": "string",
"enum": ["flash", "pro"]
}
},
"required": ["reasoning", "model_choice"]
}
You must ensure that your reasoning is no more than 2 sentences long and directly references the rubric criteria.
When making your decision, the user's request should be weighted much more heavily than the surrounding context when making your determination.
```
### 3. The Main System Prompt
```markdown
### Role
You are the **Lead Orchestrator** for an AI system. You do not talk to users. Your sole responsibility is to analyze the **Chat History** and delegate the **Current Request** to the most appropriate **Model** based on the request's complexity.
### Models
Choose between \`flash\` (SIMPLE) or \`pro\` (COMPLEX).
1. \`flash\`: A fast, efficient model for simple, well-defined tasks.
2. \`pro\`: A powerful, advanced model for complex, open-ended, or multi-step tasks.
[... Injects COMPLEXITY_RUBRIC here ...]
[... Injects OUTPUT_FORMAT here ...]
### Examples
**Example 1 (Strategic Planning):**
*User Prompt:* "How should I architect the data pipeline for this new analytics service?"
*Your JSON Output:*
{
"reasoning": "The user is asking for high-level architectural design and strategy. This falls under 'Strategic Planning & Conceptual Design'.",
"model_choice": "pro"
}
**Example 2 (Simple Tool Use):**
*User Prompt:* "list the files in the current directory"
*Your JSON Output:*
{
"reasoning": "This is a direct command requiring a single tool call (ls). It has Low Operational Complexity (1 step).",
"model_choice": "flash"
}
**Example 3 (High Operational Complexity):**
*User Prompt:* "I need to add a new 'email' field to the User schema in 'src/models/user.ts', migrate the database, and update the registration endpoint."
*Your JSON Output:*
{
"reasoning": "This request involves multiple coordinated steps across different files and systems. This meets the criteria for High Operational Complexity (4+ steps).",
"model_choice": "pro"
}
**Example 4 (Simple Read):**
*User Prompt:* "Read the contents of 'package.json'."
*Your JSON Output:*
{
"reasoning": "This is a direct command requiring a single read. It has Low Operational Complexity (1 step).",
"model_choice": "flash"
}
**Example 5 (Deep Debugging):**
*User Prompt:* "I'm getting an error 'Cannot read property 'map' of undefined' when I click the save button. Can you fix it?"
*Your JSON Output:*
{
"reasoning": "The user is reporting an error symptom without a known cause. This requires investigation and falls under 'Deep Debugging'.",
"model_choice": "pro"
}
**Example 6 (Simple Edit despite Phrasing):**
*User Prompt:* "What is the best way to rename the variable 'data' to 'userData' in 'src/utils.js'?"
*Your JSON Output:*
{
"reasoning": "Although the user uses strategic language ('best way'), the underlying task is a localized edit. The operational complexity is low (1-2 steps).",
"model_choice": "flash"
}
```
### 4. The Per-Request Prompt Structure
For every routing decision, the CLI flattens the last ~4 turns of chat history and appends the new user request.
```markdown
You are provided with a **Chat History** and the user's **Current Request** below.
#### Chat History:
[... Flattened text of the last 4 turns, excluding tool calls ...]
#### Current Request:
"[... The actual text of what the user just typed ...]"
```