While reviewing lectures/rs_inventory_q.md as part of mirroring it into QuantEcon/lecture-dp, Copilot flagged three things — all real, none blocking, but worth a small follow-up here.
Issue 1 — `solve_rs_inventory_model` prints "Converged" unconditionally
`rs_inventory_q.md` (and the same pattern in `inventory_q.md` line 331):
def solve_rs_inventory_model(v_init, model, max_iter=10_000, tol=1e-6):
v = v_init.copy()
i, error = 0, tol + 1
while i < max_iter and error > tol:
new_v = T_rs(v, model)
error = np.max(np.abs(new_v - v))
i += 1
v = new_v
print(f"Converged in {i} iterations with error {error:.2e}")
...
If the loop exits because `i == max_iter` (before `error <= tol`), the print still claims convergence. With the default 10k cap that won't bite at the configured parameters, but it's misleading for anyone who tweaks settings.
Suggestion: only print "Converged" when `error <= tol`, otherwise warn (or raise). Same fix applies in `inventory_q.md`.
Issue 2 — snapshot recorded at the start of the loop iteration
`rs_inventory_q.md` (`q_learning_rs_kernel`):
for t in range(n_steps):
# Record policy snapshot if needed
if snap_idx < n_snaps and t == snapshot_steps[snap_idx]:
snapshots[snap_idx] = greedy_policy_from_q_rs(q, K)
snap_idx += 1
...
q[x, a] = (1 - α) * q[x, a] + α * target
...
The snapshot at index `t` captures `q` before the update for step `t`. So a "snapshot at step n" actually reflects `q` after `n` completed updates only if `t = n` is visited after the n-th update — which requires running `n + 1` iterations.
That's exactly what the caller does, but it's load-bearing for the narrative…
Issue 3 — narrative vs. code mismatch in "Running Q-learning"
`rs_inventory_q.md` line 658:
We run $n$ = 5 million steps and take policy snapshots at steps 10,000, 1,000,000, and $n$.
n = 5_000_000
snap_steps = np.array([10_000, 1_000_000, n], dtype=np.int64)
q_table, snapshots = q_learning_rs(model, n_steps=n+1, snapshot_steps=snap_steps)
`n_steps=n+1` is the workaround for Issue 2 — without it, the snapshot at step `n` would not capture the final update. Two ways to make the narrative consistent:
- Move the snapshot recording to after the Q update (or after the loop), then call with `n_steps=n`.
- Or keep the snapshot at the start of the iteration but adjust indices so the caller passes `n_steps=n` and `snapshot_steps=[..., n-1]` etc.
Option 1 is cleaner and would let the prose stay literal ("run `n` steps").
Happy to send a small PR if useful — none of these are blocking and `lecture-dp` is mirroring as-is so this stays a single-source fix.
🤖 Found via Copilot review on QuantEcon/lecture-dp#20
While reviewing
lectures/rs_inventory_q.mdas part of mirroring it intoQuantEcon/lecture-dp, Copilot flagged three things — all real, none blocking, but worth a small follow-up here.Issue 1 — `solve_rs_inventory_model` prints "Converged" unconditionally
`rs_inventory_q.md` (and the same pattern in `inventory_q.md` line 331):
If the loop exits because `i == max_iter` (before `error <= tol`), the print still claims convergence. With the default 10k cap that won't bite at the configured parameters, but it's misleading for anyone who tweaks settings.
Suggestion: only print "Converged" when `error <= tol`, otherwise warn (or raise). Same fix applies in `inventory_q.md`.
Issue 2 — snapshot recorded at the start of the loop iteration
`rs_inventory_q.md` (`q_learning_rs_kernel`):
The snapshot at index `t` captures `q` before the update for step `t`. So a "snapshot at step n" actually reflects `q` after `n` completed updates only if `t = n` is visited after the n-th update — which requires running `n + 1` iterations.
That's exactly what the caller does, but it's load-bearing for the narrative…
Issue 3 — narrative vs. code mismatch in "Running Q-learning"
`rs_inventory_q.md` line 658:
`n_steps=n+1` is the workaround for Issue 2 — without it, the snapshot at step `n` would not capture the final update. Two ways to make the narrative consistent:
Option 1 is cleaner and would let the prose stay literal ("run `n` steps").
Happy to send a small PR if useful — none of these are blocking and `lecture-dp` is mirroring as-is so this stays a single-source fix.
🤖 Found via Copilot review on QuantEcon/lecture-dp#20