From 488201b908fdb506f81c4cd300a3b4febfa8ed38 Mon Sep 17 00:00:00 2001
From: vikingowl <26+vikingowl@noreply.somegit.dev>
Date: Thu, 7 May 2026 00:21:52 +0200
Subject: [PATCH] docs: add TODO roadmap for gemma routing, USP integration,
 local tmp, and ELF support

---
 TODO.md                       |  72 +++++++++++++++-
 gemma-integration-analysis.md | 153 ++++++++++++++++++++++++++++++++++
 2 files changed, 222 insertions(+), 3 deletions(-)
 create mode 100644 gemma-integration-analysis.md

diff --git a/TODO.md b/TODO.md
index 68dd2d7..e20fe8c 100644
--- a/TODO.md
+++ b/TODO.md
@@ -1,7 +1,73 @@
-# Gnoma ELF Support - TODO List
+# Gnoma - TODO List
 
-## Overview
-This document outlines the steps to add **ELF (Executable and Linkable Format)** support to Gnoma, enabling features like ELF parsing, disassembly, security analysis, and binary manipulation.
+---
+
+## Gemma Integration (Local Model Routing)
+
+See [`gemma-integration-analysis.md`](gemma-integration-analysis.md) for full architecture analysis, routing prompts, and implementation checklist.
+
+- [ ] Infrastructure & asset management (platform detection, safe installer, model manager)
+- [ ] Process & server management (background daemon, state tracking, auto-start)
+- [ ] Routing logic (complexity rubric, context flattener, strategy implementation)
+- [ ] UX (management commands, slash command, status UI)
+- [ ] Configuration & safety (scoped settings, failure resilience)
+
+---
+
+## Built-in Security Pilot (USP Integration)
+
+### Overview
+Ship the [Universal Security Pilot](https://github.com/VikingOwl91/universal-security-pilot) capabilities as first-class features in gnoma's core, rather than relying on external Markdown files and tool-specific adapters. Gnoma becomes the runtime for USP — the audit engine, remediation workflow, and AI hardening logic live inside the binary.
+
+### Core Capabilities to Internalize
+- [ ] **Security audit engine** — the eight-rule zero-trust review (adversarial input, context-aware footguns, identity integrity, atomicity, secret hygiene, AI guarding, SSRF/Dial-Control, multilingual defense)
+- [ ] **Wave Protocol enforcement** — mandatory remediation ordering (W0→W1→W2→W3→W4→W5→W6), blast-radius-descending within each wave, cross-wave dependency resolution
+- [ ] **Iron Law** — no fix ships without a failing PoC test; enforce this in the remediation workflow
+- [ ] **Standards citation** — every finding must map to OWASP Top 10 / ASVS / LLM Top 10 / MITRE ATLAS / CWE IDs
+- [ ] **AI hardening** — six-axis LLM hardening (prompt boundaries, output sanitization, BudgetGate, Dial-Control, injection vectors, multilingual defense)
+
+### Implementation Steps
+- [ ] **Skill system**: implement `sec-audit`, `sec-fix`, `ai-harden`, `sec-init` as built-in gnoma skills (not external file reads)
+- [ ] **Footgun library**: embed the universal footgun catalog (categories A–D) and framework-specific instances as structured data gnoma can query during audits
+- [ ] **Severity grading**: Critical/High/Medium/Low/Info with the canonical definitions, used in audit report output
+- [ ] **Complexity rubric**: language-specific footgun tables (Go, TS/JS, Rust, Python, etc.) as queryable rules
+- [ ] **Canonical patterns**: ship BudgetGate, Dial-Control, Envelope Encryption, OIDC state-verification as referenceable code templates gnoma can suggest or scaffold
+- [ ] **Project-local override**: support `.gnoma/security/project-pilot.toml` (or similar) for per-project tightening (never loosening)
+- [ ] **Rationalization resistance**: the anti-pressure table from `sec-fix` ("approved", "rushed deadline" do not override discipline)
+- [ ] **Report generation**: structured Markdown audit reports with standards citations, severity, and wave assignment
+
+### Considerations
+- USP is tool-agnostic by design; gnoma's implementation should preserve the framework's principles while making them native
+- The Wave Protocol ordering is load-bearing — W1 (auth) must complete before W2 (network), etc.
+- Project-local overrides can tighten but never loosen the canonical rules
+- Embed the footgun library as Go structs, not as runtime-parsed Markdown
+
+---
+
+## Local Tmp Folder (`.gnoma/tmp/`)
+
+### Overview
+Per-project temporary directory at `.gnoma/tmp/[current-working-dir]` for scratch files, intermediate outputs, and ephemeral state that shouldn't pollute the project tree or system tmp.
+
+### Implementation Steps
+- [ ] Create `.gnoma/tmp/` directory structure on first use (lazy initialization)
+- [ ] Derive subdirectory name from current working directory (hash or sanitized path)
+- [ ] Add helpers to resolve tmp paths: `gnoma.TmpDir(cwd string) string`
+- [ ] Auto-cleanup policy (e.g., prune entries older than N days, or on session end)
+- [ ] Add `.gnoma/tmp/` to default `.gitignore` generation
+- [ ] Use for tool scratch space (e.g., ELF analysis intermediates, diff staging, etc.)
+
+### Considerations
+- Avoid collisions when multiple gnoma instances target the same project
+- Keep path derivation deterministic so the same project always maps to the same tmp dir
+- Respect XDG conventions where applicable (fallback to `~/.gnoma/tmp/` if no project-local `.gnoma/`)
+
+---
+
+## ELF Support
+
+### Overview
+This section outlines the steps to add **ELF (Executable and Linkable Format)** support to Gnoma, enabling features like ELF parsing, disassembly, security analysis, and binary manipulation.
 
 ---
 
diff --git a/gemma-integration-analysis.md b/gemma-integration-analysis.md
new file mode 100644
index 0000000..1e510e3
--- /dev/null
+++ b/gemma-integration-analysis.md
@@ -0,0 +1,153 @@
+# Gemini CLI Local Model Routing (/gemma) Architecture
+
+The `/gemma` integration in the `gemini-cli` uses a local LLM to perform "Model Routing". It automatically decides whether to use a cheaper/faster model (Flash) or a more powerful one (Pro) based on the user's request.
+
+## Core Architecture
+*   **Engine:** Uses **LiteRT-LM**, a lightweight runtime that serves Gemma models via a Gemini-compatible HTTP API.
+*   **Model:** Specifically uses a quantized **Gemma 3 1B** model (`gemma3-1b-gpu-custom`). It's ~1GB and runs locally with low latency (~100-200ms for classification).
+*   **Orchestration:** The CLI manages the LiteRT server as a background daemon, tracking its state via PID files and logs.
+*   **Integration:** A `GemmaClassifierStrategy` is injected into the core `ModelRouterService`. It flattens recent chat history, sends it to the local Gemma model with a strict "Complexity Rubric," and uses the JSON response to switch models dynamically.
+
+---
+
+## Integration Todo List
+
+### 1. Infrastructure & Asset Management
+- [ ] **Platform Detection:** Logic to map OS/Arch to the correct LiteRT-LM binary download URL.
+- [ ] **Safe Installer:** Implementation of binary download + SHA256 checksum verification + permission handling (`chmod +x`, macOS quarantine removal).
+- [ ] **Model Manager:** Wrapper for the `litert-lm pull` command to download and verify the 1GB Gemma model.
+
+### 2. Process & Server Management
+- [ ] **Background Daemon:** Implementation of `spawn(..., { detached: true })` to keep the LiteRT server running independently of the CLI session.
+- [ ] **State Tracking:** A PID-file system to manage server lifecycle (start/stop/status) and prevent port collisions.
+- [ ] **Auto-Start Logic:** A manager class (`LiteRtServerManager`) that checks server health on CLI startup and launches it if enabled in settings.
+
+### 3. Routing Logic (The "Brain")
+- [ ] **Complexity Rubric:** A specialized system prompt that defines what constitutes a "SIMPLE" vs "COMPLEX" task.
+- [ ] **Context Flattener:** Utility to compress the last ~4-20 turns of chat history into a prompt suitable for a small 1B model.
+- [ ] **Strategy Implementation:** The `GemmaClassifierStrategy` class to handle the local API call, parse the JSON "reasoning," and return the model decision.
+
+### 4. User Experience (CLI & UI)
+- [ ] **Management Commands:** Commands like `gemini gemma {setup|start|stop|status|logs}` for lifecycle and troubleshooting.
+- [ ] **Slash Command:** A built-in `/gemma` command that queries the local server health and displays a status panel inside a session.
+- [ ] **React/Ink UI:** A status component to show visual indicators (green/red) for the binary, model, and server state.
+
+### 5. Configuration & Safety
+- [ ] **Scoped Settings:** Separate "User" settings (binary path) from "Workspace" settings (router enabled/disabled for a specific project).
+- [ ] **Failure Resilience:** Logic to gracefully fall back to the default model if the local classifier times out or fails.
+
+---
+
+## Routing Prompts
+
+These are the exact prompts used by the `gemini-cli` to force the small 1B model to output structured JSON with strict reasoning criteria.
+
+### 1. The Complexity Rubric
+```markdown
+### Complexity Rubric
+A task is COMPLEX (Choose \`pro\`) if it meets ONE OR MORE of the following criteria:
+1.  **High Operational Complexity (Est. 4+ Steps/Tool Calls):** Requires dependent actions, significant planning, or multiple coordinated changes.
+2.  **Strategic Planning & Conceptual Design:** Asking "how" or "why." Requires advice, architecture, or high-level strategy.
+3.  **High Ambiguity or Large Scope (Extensive Investigation):** Broadly defined requests requiring extensive investigation.
+4.  **Deep Debugging & Root Cause Analysis:** Diagnosing unknown or complex problems from symptoms.
+A task is SIMPLE (Choose \`flash\`) if it is highly specific, bounded, and has Low Operational Complexity (Est. 1-3 tool calls). Operational simplicity overrides strategic phrasing.
+```
+
+### 2. Output Format Enforcement
+```markdown
+### Output Format
+Respond *only* in JSON format like this:
+{
+  "reasoning": Your reasoning...
+  "model_choice": Either flash or pro
+}
+And you must follow the following JSON schema:
+{
+  "type": "object",
+  "properties": {
+    "reasoning": {
+      "type": "string",
+      "description": "A brief summary of the user objective, followed by a step-by-step explanation for the model choice, referencing the rubric."
+    },
+    "model_choice": {
+      "type": "string",
+      "enum": ["flash", "pro"]
+    }
+  },
+  "required": ["reasoning", "model_choice"]
+}
+You must ensure that your reasoning is no more than 2 sentences long and directly references the rubric criteria.
+When making your decision, the user's request should be weighted much more heavily than the surrounding context when making your determination.
+```
+
+### 3. The Main System Prompt
+```markdown
+### Role
+You are the **Lead Orchestrator** for an AI system. You do not talk to users. Your sole responsibility is to analyze the **Chat History** and delegate the **Current Request** to the most appropriate **Model** based on the request's complexity.
+
+### Models
+Choose between \`flash\` (SIMPLE) or \`pro\` (COMPLEX).
+1.  \`flash\`: A fast, efficient model for simple, well-defined tasks.
+2.  \`pro\`: A powerful, advanced model for complex, open-ended, or multi-step tasks.
+
+[... Injects COMPLEXITY_RUBRIC here ...]
+
+[... Injects OUTPUT_FORMAT here ...]
+
+### Examples
+**Example 1 (Strategic Planning):**
+*User Prompt:* "How should I architect the data pipeline for this new analytics service?"
+*Your JSON Output:*
+{
+  "reasoning": "The user is asking for high-level architectural design and strategy. This falls under 'Strategic Planning & Conceptual Design'.",
+  "model_choice": "pro"
+}
+**Example 2 (Simple Tool Use):**
+*User Prompt:* "list the files in the current directory"
+*Your JSON Output:*
+{
+  "reasoning": "This is a direct command requiring a single tool call (ls). It has Low Operational Complexity (1 step).",
+  "model_choice": "flash"
+}
+**Example 3 (High Operational Complexity):**
+*User Prompt:* "I need to add a new 'email' field to the User schema in 'src/models/user.ts', migrate the database, and update the registration endpoint."
+*Your JSON Output:*
+{
+  "reasoning": "This request involves multiple coordinated steps across different files and systems. This meets the criteria for High Operational Complexity (4+ steps).",
+  "model_choice": "pro"
+}
+**Example 4 (Simple Read):**
+*User Prompt:* "Read the contents of 'package.json'."
+*Your JSON Output:*
+{
+  "reasoning": "This is a direct command requiring a single read. It has Low Operational Complexity (1 step).",
+  "model_choice": "flash"
+}
+**Example 5 (Deep Debugging):**
+*User Prompt:* "I'm getting an error 'Cannot read property 'map' of undefined' when I click the save button. Can you fix it?"
+*Your JSON Output:*
+{
+  "reasoning": "The user is reporting an error symptom without a known cause. This requires investigation and falls under 'Deep Debugging'.",
+  "model_choice": "pro"
+}
+**Example 6 (Simple Edit despite Phrasing):**
+*User Prompt:* "What is the best way to rename the variable 'data' to 'userData' in 'src/utils.js'?"
+*Your JSON Output:*
+{
+  "reasoning": "Although the user uses strategic language ('best way'), the underlying task is a localized edit. The operational complexity is low (1-2 steps).",
+  "model_choice": "flash"
+}
+```
+
+### 4. The Per-Request Prompt Structure
+For every routing decision, the CLI flattens the last ~4 turns of chat history and appends the new user request.
+
+```markdown
+You are provided with a **Chat History** and the user's **Current Request** below.
+
+#### Chat History:
+[... Flattened text of the last 4 turns, excluding tool calls ...]
+
+#### Current Request:
+"[... The actual text of what the user just typed ...]"
+```