Forward-looking signal: extrapolate the weekly buckets' burn rate from a rolling history and alert when the projection lands before the natural reset. Captures the design decisions made today (weekly buckets only, linear regression over 6h, schema v2 with backward-compatible read, configurable via env). Not yet implemented.
7.4 KiB
Phase 2 — Usage prediction
Status: design, not implemented.
Goal
Today the reset_watcher only reports state changes that have already happened (a threshold was crossed, a reset was shifted). The Pro/Max weekly budget can still get exhausted before the natural reset, and the first signal is the 80% alert — which is too late to change behavior for the rest of the week.
Phase 2 adds a forward-looking signal: extrapolate the current burn rate to project when each weekly bucket will exhaust, and alert if the projection lands before the natural reset.
Scope
In scope
- The three weekly buckets:
seven_day,seven_day_opus,seven_day_sonnet. - A burn rate computed from a rolling history of recent samples.
- A new alert kind:
predicted_exhaustion. - State schema bump to retain history (backward-compatible on read).
Out of scope
- The 5-hour bucket. Five hours is too short for trend fitting to be actionable; a single heavy hour wrecks the slope, and "you'll exhaust in 2h" isn't a useful prediction.
- Statistical confidence intervals. Single-tier alert; no "low/medium/high certainty" tiers.
- Multi-model ensembles, ML. A linear fit is the right complexity for the signal-to-noise ratio of this data.
Trigger condition
After computing burn rate, alert when:
current_util + burn_rate * (resets_at - now) > 100%
i.e. "if usage continues at the current rate, the bucket will hit 100% before the natural reset".
Gating: do not alert until at least PREDICT_MIN_SAMPLES history points
exist and they span at least PREDICT_MIN_HISTORY_MINUTES. Without these
gates, the very first poll after deploy fires a noisy prediction off two
data points.
Considered and rejected:
- Time-to-exhaustion threshold ("only alert if exhaustion is within 24h of now"). Suppresses early-warning signals — defeats the point.
- Headroom percentage ("alert only if projected end-of-window util > 120%"). Filters minor overshoots but also filters legitimate "you'll exhaust 6h before reset" cases.
Burn rate computation
Linear regression over a configurable window of recent samples. Slope =
percent per second. Renders as %/day in the alert for readability.
samples = history[-N:] filtered to those within PREDICT_WINDOW_HOURS
slope, intercept = np.polyfit(timestamps, utilizations, 1)
# or hand-rolled OLS to avoid the numpy dependency
projected_util_at_reset = current_util + slope * (resets_at - now)
Considered and rejected:
- First/last delta. Susceptible to one outlier on either end.
- Theil-Sen or other robust regressions. Better tolerance for outliers but ~10x the code and probably not warranted; Anthropic's reported utilization is monotonic-non-decreasing within a window so outliers are rare by construction.
- EMA of instantaneous slope. Smoother but introduces tunable parameters that complicate the configuration surface.
Avoid numpy (heavy dep; whole image grows ~50 MB). Hand-roll OLS — ~10 lines.
State schema bump
Schema goes from version: 1 to version: 2. The new field on
WindowState:
@dataclass
class WindowState:
resets_at: int
utilization: float
alerted_thresholds: list[int] = field(default_factory=list)
# New in v2:
history: list[tuple[int, float]] = field(default_factory=list)
predicted_exhaustion_alerted_for_window: bool = False
history holds (unix_seconds, utilization_fractional) tuples, capped at
PREDICT_HISTORY_RETENTION_HOURS worth of samples (default 24h). At a
5-minute poll cadence that's ≤ 288 entries per bucket, ~1 KB JSON across
all three weekly buckets — trivial.
predicted_exhaustion_alerted_for_window resets to False whenever the
window's resets_at changes (same mechanism that resets
alerted_thresholds).
Backward compatibility: load_state treats missing v2 fields as their
defaults, so a v1 file on disk reads cleanly under v2 code with empty
history. The first run after deploy writes v2 back.
Alert format
New kind: predicted_exhaustion.
Plain:
📉 weekly Sonnet projected to exhaust before reset
current: 62.3%
projected: 108.4% at reset (resets 2026-05-22 12:00 UTC)
exhausts: ~2026-05-21 18:00 UTC (18h before reset)
burn rate: 9.2%/day
HTML uses the same blockquote + table style as the rest of the alerts. Severity color depends on hours-before-reset bucket:
> 72h before reset→ blue (info)24–72h before reset→ amber< 24h before reset→ red
Dedup: one alert per (window, resets_at) pair. State carries
predicted_exhaustion_alerted_for_window to ensure that.
Configuration
All env-overridable; defaults baked into ConfigMap:
| Env var | Default | Meaning |
|---|---|---|
PREDICT_ENABLED |
1 |
Master switch. 0 disables prediction entirely. |
PREDICT_WINDOW_HOURS |
6 |
How many hours of recent history to feed the linear fit. |
PREDICT_HISTORY_RETENTION_HOURS |
24 |
How much history to retain on disk per bucket. |
PREDICT_MIN_SAMPLES |
12 |
Minimum sample count before any prediction fires. |
PREDICT_MIN_HISTORY_MINUTES |
60 |
Minimum time-span of samples before any prediction fires. |
Implementation notes
New module: src/claude_matrix_bot/reset_watcher/predict.py. Exports:
def compute_predictions(
old_state: State,
fresh: dict[str, WindowSnapshot],
now: int,
weekly_buckets: Iterable[str],
*,
window_hours: int,
min_samples: int,
min_history_minutes: int,
) -> list[Alert]: ...
def append_history(
state: State,
fresh: dict[str, WindowSnapshot],
now: int,
weekly_buckets: Iterable[str],
*,
retention_hours: int,
) -> None: ...
Both functions are pure; the entrypoint composes them with the existing
compute_alerts. Wiring order in __main__.py:
old_state = state_io.load_state(cfg.state_path)
alerts, new_state = compute_alerts(...) # existing
append_history(new_state, fresh, now, ...) # mutate history on new_state
if cfg.predict_enabled:
alerts += compute_predictions(new_state, fresh, now, ...)
state_io.save_state(cfg.state_path, new_state)
Tests
The same TDD pattern the rest of the codebase uses. Pure logic, synthetic data, no network.
Test ideas:
- Insufficient history → no alert even if naive math projects exhaustion.
- Flat utilization → no alert (slope ≈ 0; current_util alone < 100%).
- Steady burn projected to exhaust early → alert fires, deduped on resets_at.
- Burn rate decreasing → no alert (regression captures slowing trend).
- Window rollover → dedup state resets, new prediction can fire.
- History retention → samples older than the cutoff drop on save.
- Linear fit unit tests: known slope inputs produce expected outputs.
Open questions to revisit when implementing
- Should the alert text show the most recent burn rate (last hour) and the fitted burn rate side-by-side? Could be useful when the trend just changed and the long-window fit lags reality.
- Is there value in a "trend reversed" message — e.g. the bot previously alerted on projected exhaustion, and now usage has plateaued and the projection moved past the reset? Probably yes for parity with the threshold alerts, but it adds state machinery; defer until usage.
- Phase 3 candidate: time-of-week awareness. Most users have weekly patterns (heavy weekdays, light weekends). A linear fit over Tue→Fri overestimates Sat consumption.