docs: add reddit-reader design spec

Architecture, data flow, schema, gRPC API, LLM abstraction,
TUI layout, config/setup, error handling, and testing strategy.
This commit is contained in:
2026-04-03 10:54:30 +02:00
commit 359a107571

View File

@@ -0,0 +1,298 @@
# Reddit Reader — Design Spec
## Overview
A Go TUI application that monitors subreddits for interesting posts, adds them to a reading list, and generates 5-bullet summaries using a local LLM (Ollama/llama.cpp) or Mistral Small 4 as fallback. Runs as a systemd user service for continuous monitoring; the TUI connects on launch.
## Architecture
Single Go binary with three subcommands:
- `reddit-reader serve` — monitor daemon + gRPC server
- `reddit-reader tui` — Bubble Tea client, connects via gRPC
- `reddit-reader setup` — interactive first-run wizard
### Package Layout
```
cmd/
serve.go — cobra subcommand: starts monitor + gRPC server
tui.go — cobra subcommand: launches TUI client
setup.go — cobra subcommand: first-run wizard
root.go — cobra root command
internal/
monitor/ — Reddit polling loop, orchestrates filter pipeline
filter/ — keyword/regex pre-filter + LLM relevance scoring
llm/ — Summarizer interface, Ollama/llama.cpp + Mistral backends
store/ — SQLite operations (modernc.org/sqlite, pure Go)
grpc/
server/ — gRPC service implementation
client/ — gRPC client used by TUI
tui/ — Bubble Tea views and models
config/ — TOML config parsing, env var overlay, first-run setup
proto/
redditreader.proto — protobuf service definition
```
## Data Flow
### Monitor Loop (runs in `serve`)
```
every 2min, for each subreddit:
1. go-reddit fetches /new or /hot listings
2. Dedup: skip posts already in SQLite (keyed by reddit fullname t3_xxxxx)
3. Keyword/regex pre-filter: match title/flair against configured patterns (cheap, no API calls)
4. LLM relevance scoring: "rate 0.0-1.0 how relevant to [interests]" — includes recent feedback as few-shot context
5. Posts above relevance threshold get 5-bullet summary from LLM
6. Insert post + summary + score into SQLite
7. Push to connected TUI clients via gRPC streaming
```
### LLM Call Budget
With 10-25 subreddits polled every 2 minutes, only posts passing the keyword pre-filter reach the LLM. Expected: 5-15 LLM calls per cycle, well within local model throughput and Mistral free-tier limits.
### Feedback Loop
User thumbs-up/down votes in TUI are stored in SQLite. Recent feedback examples become few-shot context in the relevance scoring prompt ("posts like X were marked interesting, posts like Y were not"). No fine-tuning — prompt engineering with history.
## SQLite Schema
```sql
CREATE TABLE subreddits (
name TEXT PRIMARY KEY,
enabled INTEGER DEFAULT 1,
poll_sort TEXT DEFAULT 'new',
added_at TEXT DEFAULT (datetime('now'))
);
CREATE TABLE filters (
id INTEGER PRIMARY KEY,
subreddit TEXT REFERENCES subreddits(name),
pattern TEXT NOT NULL,
is_regex INTEGER DEFAULT 0
);
CREATE TABLE posts (
id TEXT PRIMARY KEY, -- reddit fullname t3_xxxxx
subreddit TEXT NOT NULL,
title TEXT NOT NULL,
author TEXT,
url TEXT,
selftext TEXT,
score INTEGER,
created_utc TEXT,
fetched_at TEXT DEFAULT (datetime('now')),
relevance REAL,
summary TEXT,
read INTEGER DEFAULT 0,
starred INTEGER DEFAULT 0,
dismissed INTEGER DEFAULT 0
);
CREATE TABLE feedback (
id INTEGER PRIMARY KEY,
post_id TEXT REFERENCES posts(id),
vote INTEGER NOT NULL, -- +1 interesting, -1 not
created_at TEXT DEFAULT (datetime('now'))
);
```
## gRPC Service
```protobuf
service RedditReader {
rpc StreamPosts(StreamRequest) returns (stream Post);
rpc ListPosts(ListRequest) returns (ListResponse);
rpc UpdatePost(UpdateRequest) returns (Post);
rpc SubmitFeedback(FeedbackRequest) returns (FeedbackResponse);
rpc ListSubreddits(Empty) returns (SubredditList);
rpc AddSubreddit(AddSubredditRequest) returns (Subreddit);
rpc RemoveSubreddit(RemoveRequest) returns (Empty);
rpc UpdateFilters(FilterRequest) returns (FilterResponse);
rpc Status(Empty) returns (StatusResponse);
}
```
- `StreamPosts`: server-side stream, TUI subscribes on launch for real-time pushes
- `ListPosts`: supports filtering by subreddit, read/unread, starred, date range
- All mutations go through gRPC — single writer to SQLite, no lock contention
- Socket path: `$XDG_RUNTIME_DIR/reddit-reader.sock` (fallback `/tmp/reddit-reader.sock`)
## LLM Abstraction
```go
type Summarizer interface {
Score(ctx context.Context, post Post, interests Interests) (float64, error)
Summarize(ctx context.Context, post Post) (string, error)
}
```
### Backends
| Backend | Connection | When Used |
|---------|-----------|-----------|
| Ollama | OpenAI-compatible HTTP at `localhost:11434` | Default — setup probes for it |
| llama.cpp server | OpenAI-compatible HTTP at configurable port | Alternative local |
| Mistral API | `somegit.dev/vikingowl/mistral-go-sdk` | Fallback when no local model available |
Ollama and llama.cpp share one implementation (same OpenAI-compatible API, different base URLs). Mistral uses the dedicated SDK.
### Backend Selection (in `setup`)
1. Probe `localhost:11434` — if Ollama responds, use it, ask which model (default `mistral-small`)
2. Probe configurable llama.cpp endpoint if set
3. Fall back to Mistral API — prompt for API key
4. Store choice in config, overridable via env vars
### Relevance Prompt Includes
- User's declared interests (from config)
- Last N feedback examples as few-shot context (from SQLite)
- Post title + first ~500 chars of selftext
## TUI
Built with Bubble Tea + Lip Gloss.
### Views
- **Reading List** — default view, scrollable post list sorted by relevance, unread first
- **Starred** — favorited posts
- **Archive** — dismissed and read posts
- **Settings** — manage subreddits, keywords, LLM backend, relevance threshold (via gRPC)
### Post List
- `*` unread / `o` read indicators
- Shows subreddit, relevance score, relative time
- Enter expands to show 5-bullet summary in detail pane
### Keybindings
- `j/k` navigate, `g/G` top/bottom
- `enter` expand/collapse summary
- `s` star, `d` dismiss
- `o` open in browser
- `+/-` vote on relevance
- `/` filter, `?` help
- `tab` switch views
### On Launch
1. Connect to gRPC Unix socket
2. If connection fails and socket activation is configured, systemd starts daemon
3. `ListPosts` populates initial view
4. Subscribe to `StreamPosts` for live updates
## Configuration
### Config File
`~/.config/reddit-reader/config.toml`
```toml
[reddit]
client_id = ""
client_secret = ""
username = ""
password = ""
[llm]
backend = "ollama"
endpoint = "localhost:11434"
model = "mistral-small"
api_key = ""
relevance_threshold = 0.6
[interests]
description = "" # free-text, e.g. "Go programming, NixOS, systems programming, Linux kernel"
[monitor]
poll_interval = "2m"
max_posts_per_poll = 25
[grpc]
socket = "$XDG_RUNTIME_DIR/reddit-reader.sock"
```
Env var overrides: `REDDIT_READER_REDDIT_CLIENT_ID`, `REDDIT_READER_LLM_API_KEY`, etc.
### First-Run Setup (`reddit-reader setup`)
Interactive terminal wizard:
1. Reddit OAuth — walk through creating a script app, prompt for credentials
2. LLM backend — probe local, let user pick or enter Mistral key
3. Subreddits — add initial subreddits with keyword filters
4. Interests — free-text description for relevance prompts
5. Validate — test Reddit auth, test LLM responds, create SQLite DB
6. Systemd — optionally write and enable service + socket units
## Systemd Units
### `reddit-reader.service`
```ini
[Unit]
Description=Reddit Reader Monitor
After=network-online.target
[Service]
Type=simple
ExecStart=%h/.local/bin/reddit-reader serve
Restart=on-failure
[Install]
WantedBy=default.target
```
### `reddit-reader.socket`
```ini
[Unit]
Description=Reddit Reader Socket
[Socket]
ListenStream=%t/reddit-reader.sock
[Install]
WantedBy=sockets.target
```
Daemon is manually activated (`systemctl --user start reddit-reader.service`). Socket is always enabled — systemd starts the daemon on first TUI connection if it's not already running.
## Error Handling
- **Reddit API failures**: exponential backoff per subreddit, log warnings. After 5 consecutive failures, disable subreddit and notify TUI via gRPC stream.
- **LLM unavailable**: store posts with `relevance = NULL`, `summary = NULL`. Retry on next cycle. TUI shows "pending summary" state.
- **SQLite write errors**: fatal for daemon. Fail fast, let systemd restart.
- **gRPC connection lost**: TUI shows disconnected state, retries with backoff, resyncs via `ListPosts` on reconnect.
- **Config missing/invalid**: `serve` and `tui` check on startup, point to `reddit-reader setup`.
## Testing Strategy
- **Unit tests**: filter pipeline (keyword, regex), config parsing, LLM prompt construction, SQLite store operations (in-memory SQLite)
- **Integration tests**: gRPC server/client round-trips with real SQLite, monitor loop with mocked Reddit API responses
- **No mocking of SQLite** — use real in-memory databases
- **TDD**: tests first for store operations, filter logic, gRPC service methods
- **Interfaces for boundaries**: `Summarizer`, Reddit client, store — mock only at system boundaries
## Dependencies
| Package | Purpose |
|---------|---------|
| `github.com/vartanbeno/go-reddit/v2` | Reddit API client |
| `somegit.dev/vikingowl/mistral-go-sdk` | Mistral API backend |
| `modernc.org/sqlite` | Pure-Go SQLite |
| `github.com/charmbracelet/bubbletea` | TUI framework |
| `github.com/charmbracelet/lipgloss` | TUI styling |
| `github.com/spf13/cobra` | CLI subcommands |
| `github.com/pelletier/go-toml/v2` | Config parsing |
| `google.golang.org/grpc` | gRPC |
| `google.golang.org/protobuf` | Protobuf codegen |
## Go Version
Go 1.26.1 — use range-over-func, iterator patterns, and other 1.25/1.26 features where appropriate.