Files
claude-matrix-bot/docs/phase-2-usage-prediction.md
vikingowl 5d7ff6f061 docs: phase 2 design — usage prediction
Forward-looking signal: extrapolate the weekly buckets' burn rate from a
rolling history and alert when the projection lands before the natural
reset. Captures the design decisions made today (weekly buckets only,
linear regression over 6h, schema v2 with backward-compatible read,
configurable via env). Not yet implemented.
2026-05-18 18:14:47 +02:00

7.4 KiB
Raw Permalink Blame History

Phase 2 — Usage prediction

Status: design, not implemented.

Goal

Today the reset_watcher only reports state changes that have already happened (a threshold was crossed, a reset was shifted). The Pro/Max weekly budget can still get exhausted before the natural reset, and the first signal is the 80% alert — which is too late to change behavior for the rest of the week.

Phase 2 adds a forward-looking signal: extrapolate the current burn rate to project when each weekly bucket will exhaust, and alert if the projection lands before the natural reset.

Scope

In scope

  • The three weekly buckets: seven_day, seven_day_opus, seven_day_sonnet.
  • A burn rate computed from a rolling history of recent samples.
  • A new alert kind: predicted_exhaustion.
  • State schema bump to retain history (backward-compatible on read).

Out of scope

  • The 5-hour bucket. Five hours is too short for trend fitting to be actionable; a single heavy hour wrecks the slope, and "you'll exhaust in 2h" isn't a useful prediction.
  • Statistical confidence intervals. Single-tier alert; no "low/medium/high certainty" tiers.
  • Multi-model ensembles, ML. A linear fit is the right complexity for the signal-to-noise ratio of this data.

Trigger condition

After computing burn rate, alert when:

current_util + burn_rate * (resets_at - now) > 100%

i.e. "if usage continues at the current rate, the bucket will hit 100% before the natural reset".

Gating: do not alert until at least PREDICT_MIN_SAMPLES history points exist and they span at least PREDICT_MIN_HISTORY_MINUTES. Without these gates, the very first poll after deploy fires a noisy prediction off two data points.

Considered and rejected:

  • Time-to-exhaustion threshold ("only alert if exhaustion is within 24h of now"). Suppresses early-warning signals — defeats the point.
  • Headroom percentage ("alert only if projected end-of-window util > 120%"). Filters minor overshoots but also filters legitimate "you'll exhaust 6h before reset" cases.

Burn rate computation

Linear regression over a configurable window of recent samples. Slope = percent per second. Renders as %/day in the alert for readability.

samples = history[-N:] filtered to those within PREDICT_WINDOW_HOURS
slope, intercept = np.polyfit(timestamps, utilizations, 1)
# or hand-rolled OLS to avoid the numpy dependency
projected_util_at_reset = current_util + slope * (resets_at - now)

Considered and rejected:

  • First/last delta. Susceptible to one outlier on either end.
  • Theil-Sen or other robust regressions. Better tolerance for outliers but ~10x the code and probably not warranted; Anthropic's reported utilization is monotonic-non-decreasing within a window so outliers are rare by construction.
  • EMA of instantaneous slope. Smoother but introduces tunable parameters that complicate the configuration surface.

Avoid numpy (heavy dep; whole image grows ~50 MB). Hand-roll OLS — ~10 lines.

State schema bump

Schema goes from version: 1 to version: 2. The new field on WindowState:

@dataclass
class WindowState:
    resets_at: int
    utilization: float
    alerted_thresholds: list[int] = field(default_factory=list)
    # New in v2:
    history: list[tuple[int, float]] = field(default_factory=list)
    predicted_exhaustion_alerted_for_window: bool = False

history holds (unix_seconds, utilization_fractional) tuples, capped at PREDICT_HISTORY_RETENTION_HOURS worth of samples (default 24h). At a 5-minute poll cadence that's ≤ 288 entries per bucket, ~1 KB JSON across all three weekly buckets — trivial.

predicted_exhaustion_alerted_for_window resets to False whenever the window's resets_at changes (same mechanism that resets alerted_thresholds).

Backward compatibility: load_state treats missing v2 fields as their defaults, so a v1 file on disk reads cleanly under v2 code with empty history. The first run after deploy writes v2 back.

Alert format

New kind: predicted_exhaustion.

Plain:

📉 weekly Sonnet projected to exhaust before reset
   current:    62.3%
   projected:  108.4% at reset (resets 2026-05-22 12:00 UTC)
   exhausts:   ~2026-05-21 18:00 UTC (18h before reset)
   burn rate:  9.2%/day

HTML uses the same blockquote + table style as the rest of the alerts. Severity color depends on hours-before-reset bucket:

  • > 72h before reset → blue (info)
  • 2472h before reset → amber
  • < 24h before reset → red

Dedup: one alert per (window, resets_at) pair. State carries predicted_exhaustion_alerted_for_window to ensure that.

Configuration

All env-overridable; defaults baked into ConfigMap:

Env var Default Meaning
PREDICT_ENABLED 1 Master switch. 0 disables prediction entirely.
PREDICT_WINDOW_HOURS 6 How many hours of recent history to feed the linear fit.
PREDICT_HISTORY_RETENTION_HOURS 24 How much history to retain on disk per bucket.
PREDICT_MIN_SAMPLES 12 Minimum sample count before any prediction fires.
PREDICT_MIN_HISTORY_MINUTES 60 Minimum time-span of samples before any prediction fires.

Implementation notes

New module: src/claude_matrix_bot/reset_watcher/predict.py. Exports:

def compute_predictions(
    old_state: State,
    fresh: dict[str, WindowSnapshot],
    now: int,
    weekly_buckets: Iterable[str],
    *,
    window_hours: int,
    min_samples: int,
    min_history_minutes: int,
) -> list[Alert]: ...

def append_history(
    state: State,
    fresh: dict[str, WindowSnapshot],
    now: int,
    weekly_buckets: Iterable[str],
    *,
    retention_hours: int,
) -> None: ...

Both functions are pure; the entrypoint composes them with the existing compute_alerts. Wiring order in __main__.py:

old_state = state_io.load_state(cfg.state_path)
alerts, new_state = compute_alerts(...)              # existing
append_history(new_state, fresh, now, ...)           # mutate history on new_state
if cfg.predict_enabled:
    alerts += compute_predictions(new_state, fresh, now, ...)
state_io.save_state(cfg.state_path, new_state)

Tests

The same TDD pattern the rest of the codebase uses. Pure logic, synthetic data, no network.

Test ideas:

  • Insufficient history → no alert even if naive math projects exhaustion.
  • Flat utilization → no alert (slope ≈ 0; current_util alone < 100%).
  • Steady burn projected to exhaust early → alert fires, deduped on resets_at.
  • Burn rate decreasing → no alert (regression captures slowing trend).
  • Window rollover → dedup state resets, new prediction can fire.
  • History retention → samples older than the cutoff drop on save.
  • Linear fit unit tests: known slope inputs produce expected outputs.

Open questions to revisit when implementing

  • Should the alert text show the most recent burn rate (last hour) and the fitted burn rate side-by-side? Could be useful when the trend just changed and the long-window fit lags reality.
  • Is there value in a "trend reversed" message — e.g. the bot previously alerted on projected exhaustion, and now usage has plateaued and the projection moved past the reset? Probably yes for parity with the threshold alerts, but it adds state machinery; defer until usage.
  • Phase 3 candidate: time-of-week awareness. Most users have weekly patterns (heavy weekdays, light weekends). A linear fit over Tue→Fri overestimates Sat consumption.