From 488201b908fdb506f81c4cd300a3b4febfa8ed38 Mon Sep 17 00:00:00 2001 From: vikingowl <26+vikingowl@noreply.somegit.dev> Date: Thu, 7 May 2026 00:21:52 +0200 Subject: [PATCH] docs: add TODO roadmap for gemma routing, USP integration, local tmp, and ELF support --- TODO.md | 72 +++++++++++++++- gemma-integration-analysis.md | 153 ++++++++++++++++++++++++++++++++++ 2 files changed, 222 insertions(+), 3 deletions(-) create mode 100644 gemma-integration-analysis.md diff --git a/TODO.md b/TODO.md index 68dd2d7..e20fe8c 100644 --- a/TODO.md +++ b/TODO.md @@ -1,7 +1,73 @@ -# Gnoma ELF Support - TODO List +# Gnoma - TODO List -## Overview -This document outlines the steps to add **ELF (Executable and Linkable Format)** support to Gnoma, enabling features like ELF parsing, disassembly, security analysis, and binary manipulation. +--- + +## Gemma Integration (Local Model Routing) + +See [`gemma-integration-analysis.md`](gemma-integration-analysis.md) for full architecture analysis, routing prompts, and implementation checklist. + +- [ ] Infrastructure & asset management (platform detection, safe installer, model manager) +- [ ] Process & server management (background daemon, state tracking, auto-start) +- [ ] Routing logic (complexity rubric, context flattener, strategy implementation) +- [ ] UX (management commands, slash command, status UI) +- [ ] Configuration & safety (scoped settings, failure resilience) + +--- + +## Built-in Security Pilot (USP Integration) + +### Overview +Ship the [Universal Security Pilot](https://github.com/VikingOwl91/universal-security-pilot) capabilities as first-class features in gnoma's core, rather than relying on external Markdown files and tool-specific adapters. Gnoma becomes the runtime for USP — the audit engine, remediation workflow, and AI hardening logic live inside the binary. + +### Core Capabilities to Internalize +- [ ] **Security audit engine** — the eight-rule zero-trust review (adversarial input, context-aware footguns, identity integrity, atomicity, secret hygiene, AI guarding, SSRF/Dial-Control, multilingual defense) +- [ ] **Wave Protocol enforcement** — mandatory remediation ordering (W0→W1→W2→W3→W4→W5→W6), blast-radius-descending within each wave, cross-wave dependency resolution +- [ ] **Iron Law** — no fix ships without a failing PoC test; enforce this in the remediation workflow +- [ ] **Standards citation** — every finding must map to OWASP Top 10 / ASVS / LLM Top 10 / MITRE ATLAS / CWE IDs +- [ ] **AI hardening** — six-axis LLM hardening (prompt boundaries, output sanitization, BudgetGate, Dial-Control, injection vectors, multilingual defense) + +### Implementation Steps +- [ ] **Skill system**: implement `sec-audit`, `sec-fix`, `ai-harden`, `sec-init` as built-in gnoma skills (not external file reads) +- [ ] **Footgun library**: embed the universal footgun catalog (categories A–D) and framework-specific instances as structured data gnoma can query during audits +- [ ] **Severity grading**: Critical/High/Medium/Low/Info with the canonical definitions, used in audit report output +- [ ] **Complexity rubric**: language-specific footgun tables (Go, TS/JS, Rust, Python, etc.) as queryable rules +- [ ] **Canonical patterns**: ship BudgetGate, Dial-Control, Envelope Encryption, OIDC state-verification as referenceable code templates gnoma can suggest or scaffold +- [ ] **Project-local override**: support `.gnoma/security/project-pilot.toml` (or similar) for per-project tightening (never loosening) +- [ ] **Rationalization resistance**: the anti-pressure table from `sec-fix` ("approved", "rushed deadline" do not override discipline) +- [ ] **Report generation**: structured Markdown audit reports with standards citations, severity, and wave assignment + +### Considerations +- USP is tool-agnostic by design; gnoma's implementation should preserve the framework's principles while making them native +- The Wave Protocol ordering is load-bearing — W1 (auth) must complete before W2 (network), etc. +- Project-local overrides can tighten but never loosen the canonical rules +- Embed the footgun library as Go structs, not as runtime-parsed Markdown + +--- + +## Local Tmp Folder (`.gnoma/tmp/`) + +### Overview +Per-project temporary directory at `.gnoma/tmp/[current-working-dir]` for scratch files, intermediate outputs, and ephemeral state that shouldn't pollute the project tree or system tmp. + +### Implementation Steps +- [ ] Create `.gnoma/tmp/` directory structure on first use (lazy initialization) +- [ ] Derive subdirectory name from current working directory (hash or sanitized path) +- [ ] Add helpers to resolve tmp paths: `gnoma.TmpDir(cwd string) string` +- [ ] Auto-cleanup policy (e.g., prune entries older than N days, or on session end) +- [ ] Add `.gnoma/tmp/` to default `.gitignore` generation +- [ ] Use for tool scratch space (e.g., ELF analysis intermediates, diff staging, etc.) + +### Considerations +- Avoid collisions when multiple gnoma instances target the same project +- Keep path derivation deterministic so the same project always maps to the same tmp dir +- Respect XDG conventions where applicable (fallback to `~/.gnoma/tmp/` if no project-local `.gnoma/`) + +--- + +## ELF Support + +### Overview +This section outlines the steps to add **ELF (Executable and Linkable Format)** support to Gnoma, enabling features like ELF parsing, disassembly, security analysis, and binary manipulation. --- diff --git a/gemma-integration-analysis.md b/gemma-integration-analysis.md new file mode 100644 index 0000000..1e510e3 --- /dev/null +++ b/gemma-integration-analysis.md @@ -0,0 +1,153 @@ +# Gemini CLI Local Model Routing (/gemma) Architecture + +The `/gemma` integration in the `gemini-cli` uses a local LLM to perform "Model Routing". It automatically decides whether to use a cheaper/faster model (Flash) or a more powerful one (Pro) based on the user's request. + +## Core Architecture +* **Engine:** Uses **LiteRT-LM**, a lightweight runtime that serves Gemma models via a Gemini-compatible HTTP API. +* **Model:** Specifically uses a quantized **Gemma 3 1B** model (`gemma3-1b-gpu-custom`). It's ~1GB and runs locally with low latency (~100-200ms for classification). +* **Orchestration:** The CLI manages the LiteRT server as a background daemon, tracking its state via PID files and logs. +* **Integration:** A `GemmaClassifierStrategy` is injected into the core `ModelRouterService`. It flattens recent chat history, sends it to the local Gemma model with a strict "Complexity Rubric," and uses the JSON response to switch models dynamically. + +--- + +## Integration Todo List + +### 1. Infrastructure & Asset Management +- [ ] **Platform Detection:** Logic to map OS/Arch to the correct LiteRT-LM binary download URL. +- [ ] **Safe Installer:** Implementation of binary download + SHA256 checksum verification + permission handling (`chmod +x`, macOS quarantine removal). +- [ ] **Model Manager:** Wrapper for the `litert-lm pull` command to download and verify the 1GB Gemma model. + +### 2. Process & Server Management +- [ ] **Background Daemon:** Implementation of `spawn(..., { detached: true })` to keep the LiteRT server running independently of the CLI session. +- [ ] **State Tracking:** A PID-file system to manage server lifecycle (start/stop/status) and prevent port collisions. +- [ ] **Auto-Start Logic:** A manager class (`LiteRtServerManager`) that checks server health on CLI startup and launches it if enabled in settings. + +### 3. Routing Logic (The "Brain") +- [ ] **Complexity Rubric:** A specialized system prompt that defines what constitutes a "SIMPLE" vs "COMPLEX" task. +- [ ] **Context Flattener:** Utility to compress the last ~4-20 turns of chat history into a prompt suitable for a small 1B model. +- [ ] **Strategy Implementation:** The `GemmaClassifierStrategy` class to handle the local API call, parse the JSON "reasoning," and return the model decision. + +### 4. User Experience (CLI & UI) +- [ ] **Management Commands:** Commands like `gemini gemma {setup|start|stop|status|logs}` for lifecycle and troubleshooting. +- [ ] **Slash Command:** A built-in `/gemma` command that queries the local server health and displays a status panel inside a session. +- [ ] **React/Ink UI:** A status component to show visual indicators (green/red) for the binary, model, and server state. + +### 5. Configuration & Safety +- [ ] **Scoped Settings:** Separate "User" settings (binary path) from "Workspace" settings (router enabled/disabled for a specific project). +- [ ] **Failure Resilience:** Logic to gracefully fall back to the default model if the local classifier times out or fails. + +--- + +## Routing Prompts + +These are the exact prompts used by the `gemini-cli` to force the small 1B model to output structured JSON with strict reasoning criteria. + +### 1. The Complexity Rubric +```markdown +### Complexity Rubric +A task is COMPLEX (Choose \`pro\`) if it meets ONE OR MORE of the following criteria: +1. **High Operational Complexity (Est. 4+ Steps/Tool Calls):** Requires dependent actions, significant planning, or multiple coordinated changes. +2. **Strategic Planning & Conceptual Design:** Asking "how" or "why." Requires advice, architecture, or high-level strategy. +3. **High Ambiguity or Large Scope (Extensive Investigation):** Broadly defined requests requiring extensive investigation. +4. **Deep Debugging & Root Cause Analysis:** Diagnosing unknown or complex problems from symptoms. +A task is SIMPLE (Choose \`flash\`) if it is highly specific, bounded, and has Low Operational Complexity (Est. 1-3 tool calls). Operational simplicity overrides strategic phrasing. +``` + +### 2. Output Format Enforcement +```markdown +### Output Format +Respond *only* in JSON format like this: +{ + "reasoning": Your reasoning... + "model_choice": Either flash or pro +} +And you must follow the following JSON schema: +{ + "type": "object", + "properties": { + "reasoning": { + "type": "string", + "description": "A brief summary of the user objective, followed by a step-by-step explanation for the model choice, referencing the rubric." + }, + "model_choice": { + "type": "string", + "enum": ["flash", "pro"] + } + }, + "required": ["reasoning", "model_choice"] +} +You must ensure that your reasoning is no more than 2 sentences long and directly references the rubric criteria. +When making your decision, the user's request should be weighted much more heavily than the surrounding context when making your determination. +``` + +### 3. The Main System Prompt +```markdown +### Role +You are the **Lead Orchestrator** for an AI system. You do not talk to users. Your sole responsibility is to analyze the **Chat History** and delegate the **Current Request** to the most appropriate **Model** based on the request's complexity. + +### Models +Choose between \`flash\` (SIMPLE) or \`pro\` (COMPLEX). +1. \`flash\`: A fast, efficient model for simple, well-defined tasks. +2. \`pro\`: A powerful, advanced model for complex, open-ended, or multi-step tasks. + +[... Injects COMPLEXITY_RUBRIC here ...] + +[... Injects OUTPUT_FORMAT here ...] + +### Examples +**Example 1 (Strategic Planning):** +*User Prompt:* "How should I architect the data pipeline for this new analytics service?" +*Your JSON Output:* +{ + "reasoning": "The user is asking for high-level architectural design and strategy. This falls under 'Strategic Planning & Conceptual Design'.", + "model_choice": "pro" +} +**Example 2 (Simple Tool Use):** +*User Prompt:* "list the files in the current directory" +*Your JSON Output:* +{ + "reasoning": "This is a direct command requiring a single tool call (ls). It has Low Operational Complexity (1 step).", + "model_choice": "flash" +} +**Example 3 (High Operational Complexity):** +*User Prompt:* "I need to add a new 'email' field to the User schema in 'src/models/user.ts', migrate the database, and update the registration endpoint." +*Your JSON Output:* +{ + "reasoning": "This request involves multiple coordinated steps across different files and systems. This meets the criteria for High Operational Complexity (4+ steps).", + "model_choice": "pro" +} +**Example 4 (Simple Read):** +*User Prompt:* "Read the contents of 'package.json'." +*Your JSON Output:* +{ + "reasoning": "This is a direct command requiring a single read. It has Low Operational Complexity (1 step).", + "model_choice": "flash" +} +**Example 5 (Deep Debugging):** +*User Prompt:* "I'm getting an error 'Cannot read property 'map' of undefined' when I click the save button. Can you fix it?" +*Your JSON Output:* +{ + "reasoning": "The user is reporting an error symptom without a known cause. This requires investigation and falls under 'Deep Debugging'.", + "model_choice": "pro" +} +**Example 6 (Simple Edit despite Phrasing):** +*User Prompt:* "What is the best way to rename the variable 'data' to 'userData' in 'src/utils.js'?" +*Your JSON Output:* +{ + "reasoning": "Although the user uses strategic language ('best way'), the underlying task is a localized edit. The operational complexity is low (1-2 steps).", + "model_choice": "flash" +} +``` + +### 4. The Per-Request Prompt Structure +For every routing decision, the CLI flattens the last ~4 turns of chat history and appends the new user request. + +```markdown +You are provided with a **Chat History** and the user's **Current Request** below. + +#### Chat History: +[... Flattened text of the last 4 turns, excluding tool calls ...] + +#### Current Request: +"[... The actual text of what the user just typed ...]" +```