Three compounding bugs prevented tool calling with llama.cpp:
- Stream parser set argsComplete on partial JSON (e.g. "{"), dropping
subsequent argument deltas — fix: use json.Valid to detect completeness
- Missing tool_choice default — llama.cpp needs explicit "auto" to
activate its GBNF grammar constraint; now set when tools are present
- Tool names in history used internal format (fs.ls) while definitions
used API format (fs_ls) — now re-sanitized in translateMessage
Additional changes:
- Disable SDK retries for local providers (500s are deterministic)
- Dynamic capability probing via /props (llama.cpp) and /api/show
(Ollama), replacing hardcoded model prefix list
- Engine respects forced arm ToolUse capability when router is active
- Bundled /init skill with Go template blocks, context-aware for local
vs cloud models, deduplication rules against CLAUDE.md
- Tool result compaction for local models — previous round results
replaced with size markers to stay within small context windows
- Text-only fallback when tool-parse errors occur on local models
- "text-only" TUI indicator when model lacks tool support
- Session ResetError for retry after stream failures
- AllowedTools per-turn filtering in engine buildRequest
gnoma
A provider-agnostic agentic coding assistant built in Go. gnoma routes tasks to the best available LLM — cloud or local — through a multi-armed bandit router, while tools, hooks, skills, MCP servers, and plugins keep it extensible. Named after the northern pygmy-owl (Glaucidium gnoma); agents are called elfs (elf owl).
Quickstart
# Install
go install somegit.dev/Owlibou/gnoma/cmd/gnoma@latest
# Or build from source
git clone https://somegit.dev/Owlibou/gnoma && cd gnoma
make build # binary at ./bin/gnoma
# Set at least one provider key
export ANTHROPIC_API_KEY=sk-ant-... # or OPENAI_API_KEY, MISTRAL_API_KEY, GEMINI_API_KEY
# Run
gnoma # interactive TUI
echo "list files" | gnoma # pipe mode
gnoma --provider ollama # use a local model
Build
make build # ./bin/gnoma
make install # $GOPATH/bin/gnoma
Providers
Anthropic
export ANTHROPIC_API_KEY=sk-ant-...
./bin/gnoma --provider anthropic
./bin/gnoma --provider anthropic --model claude-opus-4-5-20251001
Integration tests hit the real API — keep a key in env:
go test -tags integration ./internal/provider/...
OpenAI
export OPENAI_API_KEY=sk-proj-...
./bin/gnoma --provider openai
./bin/gnoma --provider openai --model gpt-4o
Mistral
export MISTRAL_API_KEY=...
./bin/gnoma --provider mistral
Google (Gemini)
export GEMINI_API_KEY=AIza...
./bin/gnoma --provider google
./bin/gnoma --provider google --model gemini-2.0-flash
Ollama (local)
Start Ollama and pull a model, then:
./bin/gnoma --provider ollama --model gemma4:latest
./bin/gnoma --provider ollama --model qwen3:8b # default if --model omitted
Default endpoint: http://localhost:11434/v1. Override via config or env:
# .gnoma/config.toml
[provider]
default = "ollama"
model = "gemma4:latest"
[provider.endpoints]
ollama = "http://myhost:11434/v1"
llama.cpp (local)
Start the llama.cpp server:
llama-server --model /path/to/model.gguf --port 8080 --ctx-size 8192
Then:
./bin/gnoma --provider llamacpp
# model name is taken from the server's /v1/models response
Default endpoint: http://localhost:8080/v1. Override:
[provider.endpoints]
llamacpp = "http://localhost:9090/v1"
Extensibility (M8)
gnoma supports hooks, skills, MCP servers, and plugins.
MCP Servers
Connect any MCP-compatible tool server:
[[mcp_servers]]
name = "git"
command = "mcp-server-git"
args = ["--repo", "."]
timeout = "30s"
# Replace a built-in tool with an MCP tool
[mcp_servers.replace_default]
exec = "bash" # MCP tool "exec" replaces gnoma's built-in "bash"
MCP tools appear as mcp__{server}__{tool} (e.g., mcp__git__status), or under the built-in name when using replace_default.
Skills
Drop markdown files into .gnoma/skills/ or ~/.config/gnoma/skills/:
/skillname # invoke a skill
/skills # list available skills
Hooks
Run shell commands on tool events:
[[hooks]]
name = "block-rm-rf"
event = "pre_tool_use"
type = "command"
exec = "bash-safety-check.sh"
tool_pattern = "bash*"
Plugins
Bundle skills, hooks, and MCP configs into installable plugins:
gnoma plugin install ./my-plugin # install from directory
gnoma plugin list # list installed plugins
Session Persistence
Conversations are auto-saved to .gnoma/sessions/ after each completed turn. On a crash you lose at most the current in-flight turn; all previously completed turns are safe.
Resume a session
gnoma --resume # interactive session picker (↑↓ navigate, Enter load, Esc cancel)
gnoma --resume <id> # restore directly by ID
gnoma -r # shorthand
Inside the TUI:
/resume # open picker
/resume <id> # restore by ID
Incognito mode
gnoma --incognito # no session saved, no quality scores updated
Toggle at runtime with Ctrl+X.
Config
[session]
max_keep = 20 # how many sessions to retain per project (default: 20)
Sessions are stored per-project under .gnoma/sessions/<id>/. Quality scores (EMA routing data) are stored globally at ~/.config/gnoma/quality.json.
Config
Config is read in priority order:
~/.config/gnoma/config.toml— global.gnoma/config.toml— project-local (next togo.mod/.git)- Environment variables
Example .gnoma/config.toml:
[provider]
default = "anthropic"
model = "claude-sonnet-4-6"
[provider.api_keys]
anthropic = "${ANTHROPIC_API_KEY}"
[provider.endpoints]
ollama = "http://localhost:11434/v1"
llamacpp = "http://localhost:8080/v1"
[permission]
mode = "auto" # auto | accept_edits | bypass | deny | plan
Environment variable overrides: GNOMA_PROVIDER, GNOMA_MODEL.
Testing
make test # unit tests
make test-integration # integration tests (require real API keys)
make cover # coverage report → coverage.html
make lint # golangci-lint
make check # fmt + vet + lint + test
Integration tests are gated behind //go:build integration and skipped by default.