rs_inventory_q / inventory_q: misleading 'Converged' print + snapshot off-by-one in Q-learning loop

While reviewing [`lectures/rs_inventory_q.md`](https://github.com/QuantEcon/lecture-python.myst/blob/main/lectures/rs_inventory_q.md) as part of mirroring it into [`QuantEcon/lecture-dp`](https://github.com/QuantEcon/lecture-dp/pull/20), Copilot flagged three things — all real, none blocking, but worth a small follow-up here.

## Issue 1 — \`solve_rs_inventory_model\` prints "Converged" unconditionally

\`rs_inventory_q.md\` (and the same pattern in \`inventory_q.md\` line 331):

```python
def solve_rs_inventory_model(v_init, model, max_iter=10_000, tol=1e-6):
    v = v_init.copy()
    i, error = 0, tol + 1
    while i < max_iter and error > tol:
        new_v = T_rs(v, model)
        error = np.max(np.abs(new_v - v))
        i += 1
        v = new_v
    print(f"Converged in {i} iterations with error {error:.2e}")
    ...
```

If the loop exits because \`i == max_iter\` (before \`error <= tol\`), the print still claims convergence. With the default 10k cap that won't bite at the configured parameters, but it's misleading for anyone who tweaks settings.

Suggestion: only print "Converged" when \`error <= tol\`, otherwise warn (or raise). Same fix applies in \`inventory_q.md\`.

## Issue 2 — snapshot recorded at the *start* of the loop iteration

\`rs_inventory_q.md\` (\`q_learning_rs_kernel\`):

```python
for t in range(n_steps):
    # Record policy snapshot if needed
    if snap_idx < n_snaps and t == snapshot_steps[snap_idx]:
        snapshots[snap_idx] = greedy_policy_from_q_rs(q, K)
        snap_idx += 1
    ...
    q[x, a] = (1 - α) * q[x, a] + α * target
    ...
```

The snapshot at index \`t\` captures \`q\` *before* the update for step \`t\`. So a "snapshot at step n" actually reflects \`q\` after \`n\` completed updates only if \`t = n\` is visited *after* the n-th update — which requires running \`n + 1\` iterations.

That's exactly what the caller does, but it's load-bearing for the narrative…

## Issue 3 — narrative vs. code mismatch in "Running Q-learning"

\`rs_inventory_q.md\` line 658:

> We run \$n\$ = 5 million steps and take policy snapshots at steps 10,000, 1,000,000, and \$n\$.

```python
n = 5_000_000
snap_steps = np.array([10_000, 1_000_000, n], dtype=np.int64)
q_table, snapshots = q_learning_rs(model, n_steps=n+1, snapshot_steps=snap_steps)
```

\`n_steps=n+1\` is the workaround for Issue 2 — without it, the snapshot at step \`n\` would not capture the final update. Two ways to make the narrative consistent:

1. Move the snapshot recording to *after* the Q update (or after the loop), then call with \`n_steps=n\`.
2. Or keep the snapshot at the start of the iteration but adjust indices so the caller passes \`n_steps=n\` and \`snapshot_steps=[..., n-1]\` etc.

Option 1 is cleaner and would let the prose stay literal ("run \`n\` steps").

---

Happy to send a small PR if useful — none of these are blocking and \`lecture-dp\` is mirroring as-is so this stays a single-source fix.

🤖 Found via Copilot review on QuantEcon/lecture-dp#20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

rs_inventory_q / inventory_q: misleading 'Converged' print + snapshot off-by-one in Q-learning loop #882

Issue 1 — `solve_rs_inventory_model` prints "Converged" unconditionally

Issue 2 — snapshot recorded at the start of the loop iteration

Issue 3 — narrative vs. code mismatch in "Running Q-learning"

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

rs_inventory_q / inventory_q: misleading 'Converged' print + snapshot off-by-one in Q-learning loop #882

Description

Issue 1 — `solve_rs_inventory_model` prints "Converged" unconditionally

Issue 2 — snapshot recorded at the start of the loop iteration

Issue 3 — narrative vs. code mismatch in "Running Q-learning"

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions