Hi there,
I noticed that the max_future_q value in this function def update_q_table(self, LEARNING_RATE, DISCOUNT, old_paths, new_paths) might be incorrect. It appears that max_future_q is still using the state of old_paths.path_queues, whereas it should be using new_paths.path_queues. Could you please confirm if the following correction is valid? Thank you very much!
Compute indices for the new state (next state)
future_indices = [
math.ceil(min(10, new_paths.path_queues[0] / 10)), # New state (queue 1)
math.ceil(min(10, new_paths.path_queues[1] / 10)) # New state (queue 2)
]
Get the best Q-value for the new state
max_future_q = np.max(self.q_table[:, future_indices[0], future_indices[1]])
Hi there,
I noticed that the
max_future_qvalue in this functiondef update_q_table(self, LEARNING_RATE, DISCOUNT, old_paths, new_paths)might be incorrect. It appears thatmax_future_qis still using the state ofold_paths.path_queues, whereas it should be usingnew_paths.path_queues. Could you please confirm if the following correction is valid? Thank you very much!Compute indices for the new state (next state)
Get the best Q-value for the new state