Dear AgentFormer authors,
I would like to get a better understanding of the AgentFormer model. As I review the source code, one of the parameters that has been rather difficult for me to understand is the sn_out_type attribute of the FutureDecoder. So far, here's what I understood about this variable:
- it is a string, with default value
"scene_norm", but it can also take the values "norm" or "vel" if the model is configured as such (by specifying the parameter value in the cfg file)
- it alters the behaviour of the
decode_traj_ar method of the FutureDecoder, by modifying the format of the output seq_out that is being predicted. seq_out is the sequence of positions that is actually predicted by the model.
- by default, no modification is performed at all.
seq_out is translated into dec_motion, which is the actual variable that is compared with the ground truth for loss computation. This is done independently from the alterations performed by sn_out_type (instead, it is done by pred_type, which, quite clearly, defines whether the model is supposed to predict velocities, positions, or positions aligned with the scene origin).
sn_out_type does not alter the behaviour of the code in any other place than in FutureDecoder.
From my observations, I am inclined to believe that sn_out_type defines the format of the ground truth. If the ground truth were to be proposed to the model in a different format than the predicted one (eg, if the ground truth trajectories are expressed as velocities while the model predicts scene aligned positions), then sn_out_type is responsible for ensuring that the predicted sequence is first translated into the right format before we can perform comparison between prediction and ground truth for loss calculation. Is this correct? I am uncertain whether that is right, since I've found that sn_out_type does not alter the processing of the ground truth in the data_generator or the preprocessor classes, which I would expect this to be the case, if my guess about this variable acting as a regulator between ground truth and prediction to be right.
If anyone could help me clarify this, I would be very thankful.
Dear AgentFormer authors,
I would like to get a better understanding of the AgentFormer model. As I review the source code, one of the parameters that has been rather difficult for me to understand is the
sn_out_typeattribute of theFutureDecoder. So far, here's what I understood about this variable:"scene_norm", but it can also take the values"norm"or"vel"if the model is configured as such (by specifying the parameter value in thecfgfile)decode_traj_armethod of theFutureDecoder, by modifying the format of the outputseq_outthat is being predicted.seq_outis the sequence of positions that is actually predicted by the model.seq_outis translated intodec_motion, which is the actual variable that is compared with the ground truth for loss computation. This is done independently from the alterations performed bysn_out_type(instead, it is done bypred_type, which, quite clearly, defines whether the model is supposed to predict velocities, positions, or positions aligned with the scene origin).sn_out_typedoes not alter the behaviour of the code in any other place than inFutureDecoder.From my observations, I am inclined to believe that
sn_out_typedefines the format of the ground truth. If the ground truth were to be proposed to the model in a different format than the predicted one (eg, if the ground truth trajectories are expressed as velocities while the model predicts scene aligned positions), thensn_out_typeis responsible for ensuring that the predicted sequence is first translated into the right format before we can perform comparison between prediction and ground truth for loss calculation. Is this correct? I am uncertain whether that is right, since I've found thatsn_out_typedoes not alter the processing of the ground truth in thedata_generatoror thepreprocessorclasses, which I would expect this to be the case, if my guess about this variable acting as a regulator between ground truth and prediction to be right.If anyone could help me clarify this, I would be very thankful.