Skip to content

rs_inventory_q / inventory_q: misleading 'Converged' print + snapshot off-by-one in Q-learning loop #882

@mmcky

Description

@mmcky

While reviewing lectures/rs_inventory_q.md as part of mirroring it into QuantEcon/lecture-dp, Copilot flagged three things — all real, none blocking, but worth a small follow-up here.

Issue 1 — `solve_rs_inventory_model` prints "Converged" unconditionally

`rs_inventory_q.md` (and the same pattern in `inventory_q.md` line 331):

def solve_rs_inventory_model(v_init, model, max_iter=10_000, tol=1e-6):
    v = v_init.copy()
    i, error = 0, tol + 1
    while i < max_iter and error > tol:
        new_v = T_rs(v, model)
        error = np.max(np.abs(new_v - v))
        i += 1
        v = new_v
    print(f"Converged in {i} iterations with error {error:.2e}")
    ...

If the loop exits because `i == max_iter` (before `error <= tol`), the print still claims convergence. With the default 10k cap that won't bite at the configured parameters, but it's misleading for anyone who tweaks settings.

Suggestion: only print "Converged" when `error <= tol`, otherwise warn (or raise). Same fix applies in `inventory_q.md`.

Issue 2 — snapshot recorded at the start of the loop iteration

`rs_inventory_q.md` (`q_learning_rs_kernel`):

for t in range(n_steps):
    # Record policy snapshot if needed
    if snap_idx < n_snaps and t == snapshot_steps[snap_idx]:
        snapshots[snap_idx] = greedy_policy_from_q_rs(q, K)
        snap_idx += 1
    ...
    q[x, a] = (1 - α) * q[x, a] + α * target
    ...

The snapshot at index `t` captures `q` before the update for step `t`. So a "snapshot at step n" actually reflects `q` after `n` completed updates only if `t = n` is visited after the n-th update — which requires running `n + 1` iterations.

That's exactly what the caller does, but it's load-bearing for the narrative…

Issue 3 — narrative vs. code mismatch in "Running Q-learning"

`rs_inventory_q.md` line 658:

We run $n$ = 5 million steps and take policy snapshots at steps 10,000, 1,000,000, and $n$.

n = 5_000_000
snap_steps = np.array([10_000, 1_000_000, n], dtype=np.int64)
q_table, snapshots = q_learning_rs(model, n_steps=n+1, snapshot_steps=snap_steps)

`n_steps=n+1` is the workaround for Issue 2 — without it, the snapshot at step `n` would not capture the final update. Two ways to make the narrative consistent:

  1. Move the snapshot recording to after the Q update (or after the loop), then call with `n_steps=n`.
  2. Or keep the snapshot at the start of the iteration but adjust indices so the caller passes `n_steps=n` and `snapshot_steps=[..., n-1]` etc.

Option 1 is cleaner and would let the prose stay literal ("run `n` steps").


Happy to send a small PR if useful — none of these are blocking and `lecture-dp` is mirroring as-is so this stays a single-source fix.

🤖 Found via Copilot review on QuantEcon/lecture-dp#20

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions