gnoma

Author	SHA1	Message	Date
vikingowl	c0c2e4bff5	fix(slm): enforce JSON output + strip thinking-block prefixes Two structural fixes for the SLM classifier's 100% failure rate: (1) Pass ResponseFormat=json_object + Temperature=0 + TopP=1 + MaxTokens=128 in the classifier Request. The provider type already supports these but callSLM was leaving them unset, which meant ollama (and any other backend) ran with default sampling and free-form text output. format=json mode in particular makes ollama emit only valid JSON at decoding time — eliminates the majority of parse failures. (2) Harden extractJSON to strip common thinking-block tags before hunting for the brace. Seen in the wild: <think>…</think> (Qwen3 distillations) and <Thought Process>…</Thought Process> (tiny3.5). Defensive list also covers <reasoning>, <thoughts>. Unterminated thinking blocks fall back to brace-search so we still have a shot. Table-driven tests cover all variants plus the no-tag and fenced-json paths to confirm no regression. Even with format=json on a capable provider, the extractor is the safety net for backends that don't enforce format strictly — same defence-in-depth shape as the existing fence stripping. Doesn't fix the deeper architecture question (encoder + bandit preferred over decoder-SLM as classifier — see plan doc landing in the same PR); fixes the immediate bug.	2026-05-25 01:19:51 +02:00
vikingowl	f3c70bd802	fix(slm,router): honest classifier diagnostics + 15s default timeout Five fixes folded into one commit because they all answer the same question: 'why does my router stats output lie to me?' Issue 1 (timeout). Default classify timeout was 5s — too short for cold-start ollama loads on small models. Bumped to 15s and surfaced as [slm].classify_timeout (0 = built-in default). Empirically caught when a user's reecdev/tiny3.5:1.5b hit 'stream error: context deadline exceeded' on every single classify call. Issue 2 (Warn-level error). The SLM-fallback path logged the underlying error at Debug, invisible without --verbose. Promoted to Warn so a first-time misconfiguration surfaces immediately. The fallback itself is benign; the signal is that the SLM isn't doing the work it was supposed to. Issue 3 (stats hint). Hard-coded 'check that llamafile boots' even when the user is on ollama. Replaced with backend-templated advice read from cfg.SLM.Backend. Also distinguishes three diagnostic cases that were collapsed before: - SLM never called (zero attempts) - SLM called N times but every call fell back (timeout/parse) - SLM working but minority share Issue 4 (effective heuristic share). The classifier breakdown shows 'heuristic' and 'slm_fallback' as separate sources, but both routed through HeuristicClassifier — only the source tag differs. New line under 'total observations' surfaces the combined share honestly: 'effective heuristic share: 100% (44 fallbacks + 10 pure heuristic)'. Issue 5 (config schema). [slm].classify_timeout joins the existing [slm] knobs alongside startup_timeout. Documented inline with the cold-start-load rationale.	2026-05-25 01:05:57 +02:00
vikingowl	c4fde583f5	chore(lint): gofmt sweep + errcheck cleanups in router discovery Apply gofmt -w across the codebase (struct field comment realignment only — no semantic changes) and silence two errcheck warnings on fmt.Sscanf / fmt.Fprintf return values in internal/router/discovery with explicit `_, _ =` discards. Required so `make check` is green before tagging v0.1.0.	2026-05-20 03:13:05 +02:00
vikingowl	fb42202834	refactor(security): seal SecureProvider via unexported marker method The router.SecureProvider interface previously required a public IsSecure() bool method. Any test mock — or future production type — could satisfy it by returning true, defeating the W1 "only wrapped providers may flow past the boundary" contract through convention rather than at the type level. Replaces IsSecure() bool with an unexported security.Marker interface that has a single secured() method. Go's method-set semantics key unexported methods by their defining package, so only types declared in internal/security can satisfy Marker. *SafeProvider gets the lone secured() implementation; router.SecureProvider embeds Marker. The seal forces every test mock that previously implemented IsSecure() to either (a) be wrapped with security.WrapProvider(mp, nil) at the use site, or (b) drop the method entirely if the mock never flows through SecureProvider. 93 use sites across 11 test files were updated via a per-package secureMock helper. WrapProvider with a nil firewall ref is a no-op pass-through, so test behavior is unchanged. Empirically: a type from outside internal/security can declare `secured()` but the compiler will reject assigning it to router.SecureProvider because the unexported method belongs to the other package's namespace. Convention → compile-time guarantee.	2026-05-20 02:04:07 +02:00
vikingowl	3c875276c9	feat(security): implement multi-wave audit remediation and agy provider support Implemented full security remediation following Universal Security Pilot protocol: - W1: Enforced SecureProvider at router and engine boundaries to prevent bypasses. - W1: Implemented path-sensitive policy for MCP tools. - W2: Added SHA256 hash verification for SLM downloads (llamafile). - W3: Enhanced secret redaction for private keys (full body) and high-entropy strings. - W4: Fixed symlink-based filesystem sandbox escapes in paths and grep. - W4: Documented CLI agent trust boundaries. Also added 'agy' (Antigravity) as a subprocess CLI provider with plain-text JSON schema support.	2026-05-20 01:13:13 +02:00
vikingowl	342b3903e1	test(slm): align HappyPath with task-type complexity floor The Debug floor (0.4) added in `eb0583f` was bumping the SLM-returned 0.25 up, breaking the HappyPath assertion. Bump the SLM value to 0.55 so the test still verifies "SLM value preserved" (its original intent), and add a dedicated TestClassifier_AppliesTaskTypeFloor that exercises the under-reporting case the floor was added to handle.	2026-05-19 20:54:27 +02:00
vikingowl	58beb7ce3c	feat(router): classifier-source telemetry + router stats command Phase 4 routing decisions depend on knowing whether the SLM classifier is actually firing or whether the heuristic is silently doing all the work. Adds the instrumentation to make that observable. router.ClassifierSource enum (heuristic / slm / slm_fallback) is set on Task by every classifier: - HeuristicClassifier → ClassifierHeuristic - slm.Classifier → ClassifierSLM on success, ClassifierSLMFallback when the SLM call fails or returns unparseable output The source is plumbed through router.Outcome to QualityTracker, which now maintains per-source counters alongside the existing per-arm × task EMA scores. QualitySnapshot serializes both (classifier_counts is omitempty for back-compat with pre-feature quality.json files). lazyClassifier logs at INFO the first time it falls back to heuristic because the SLM hasn't booted yet — distinguishes operational fallback from an unconfigured-SLM run. slm.Manager.Start() now records elapsed-to-healthy and the main.go goroutine logs it as part of the "SLM ready" event. Confirms whether short-lived runs are racing the boot cycle. New `gnoma router stats` subcommand prints both tables (arm × task quality, classifier source breakdown) from quality.json with a Phase 4 trust hint when the data is too sparse or the SLM share is low. 6 new tests cover ClassifierSource string/enum, heuristic + SLM source propagation, QualityTracker counter round-trip, and back-compat restore from a legacy quality.json without classifier_counts.	2026-05-19 18:18:22 +02:00
vikingowl	a9213ec382	feat(slm): Wave C — SLM classifier, MaxComplexity routing, CLI subcommands, TUI status - slm.Classifier: openaicompat → llamafile, 2s timeout + heuristic fallback, heuristic baseline blended so Priority/RequiredEffort are never zeroed, extractJSON strips markdown fences from small-model responses - router.ParseTaskType: case-insensitive string → TaskType, unknown → TaskGeneration - router.Arm.MaxComplexity: zero = no ceiling (preserves existing arm behavior); filterFeasible excludes arms when task.ComplexityScore > MaxComplexity - config.SLMSection: [slm] enabled / model_url / data_dir - openaicompat.NewLlamafile: no API key, model = "default", no retries - slm.Manager: DefaultDataDir() (XDG), Manifest() accessor - cmd/gnoma: `gnoma slm setup` / `gnoma slm status` subcommands; SLM arm registered with MaxComplexity=0.3 when enabled + set up - tui: /config shows slm status (ready/missing/not set up + base URL if running) - docs: roadmap updated to reflect llamafile pivot from Ollama	2026-05-07 16:44:32 +02:00

8 Commits