I want to use immediate reward from environment to teach my RL model. As described in the document, I implemented "reward" function in "Environment" class.
However, when I checked loss calculation flow in train.py file, losses['v'] seems only consider value outputted from model and outcome from environment. Also, I found that loss['r'] takes into account the rewards from the environment.
Does this mean that my model also needs to output a "return" value ?
I want to use immediate reward from environment to teach my RL model. As described in the document, I implemented "reward" function in "Environment" class.
However, when I checked loss calculation flow in train.py file, losses['v'] seems only consider value outputted from model and outcome from environment. Also, I found that loss['r'] takes into account the rewards from the environment.
Does this mean that my model also needs to output a "return" value ?