UPDATE Documentation by --resume case for long-running `init` operations

# Checkpoint/Resume for long-running `init` operations

## Problem

When running `repowise init` on large repositories, the process can take many hours. If the process is interrupted (timeout, crash, user termination, network issues), **all progress is lost** and the init must restart from scratch.

**My experience:**
- Repository: ~553 pages to generate
- Process ran for 8+ hours (416/553 pages completed)
- Process timed out after ~11,218 seconds
- **Result:** All 416 pages generated were lost - SQL database showed 0 pages

The pages existed in LanceDB (`_transactions/`, `_versions/`, `data/` folders with 416 fragments) but were not committed to SQL and were not recoverable via standard commands. I had to manually write a Python script to extract data from LanceDB and import it to SQL.

I'm frustrated when I spend 8 hours waiting for indexing only to lose everything because of a timeout or interruption.

## Proposed Solution

**1. Periodic checkpoints** - Commit to SQL database every N pages (configurable)
```bash
repowise init . --checkpoint-interval 50  # Save every 50 pages
```

**2. Auto-resume on re-run** - `repowise init` should detect partial state and continue:
```bash
$ repowise init .
Resuming from checkpoint: 416/553 pages already generated
Generating pages... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 137/553
```

**3. Graceful shutdown** - SIGTERM/SIGINT should trigger final checkpoint before exit

**4. Recovery command** (optional, for cases where checkpoint failed):
```bash
repowise recover .  # Scan LanceDB and import uncommitted pages to SQL
```

## Alternatives Considered

**Workaround I used:**
1. Manually extracted data from `.repowise/lancedb/wiki_pages.lance/` using Python + lancedb
2. Created `state.json` manually with page count
3. Inserted pages into SQL via sqlite3
4. Rebuilt FTS index manually

This is too complex for average users.

**Other approaches:**
- **External process manager** (systemd, supervisord) - doesn't help, data loss occurs within repowise
- **Smaller batch sizes** - still loses all progress if interrupted mid-batch
- **`repowise update` instead of `init`** - only works for git changes, not for resume after crash

## Additional Context

**Environment:**
- repowise: latest
- Repository size: 553 pages
- Runtime before interruption: ~8 hours
- Provider: litellm with zai/glm-5

**Why this matters:**
Large codebases (1000+ files, enterprise repos) are common. For these users, repowise is unusable in production environments where long-running processes may be interrupted. A single timeout wastes hours of compute time and API costs.

**Relevant code locations:**
- `generation_jobs` table tracks progress but doesn't seem to implement checkpoints
- LanceDB has uncommitted transactions in `_transactions/` folder
- `repowise doctor` detects "Coordinator drift" but can't fix it automatically

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPDATE Documentation by --resume case for long-running `init` operations #113

Checkpoint/Resume for long-running `init` operations

Problem

Proposed Solution

Alternatives Considered

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

UPDATE Documentation by --resume case for long-running init operations #113

Description

Checkpoint/Resume for long-running init operations

Problem

Proposed Solution

Alternatives Considered

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

UPDATE Documentation by --resume case for long-running `init` operations #113

Checkpoint/Resume for long-running `init` operations