Hey, there.
After I read the code, I am confused that the computation cost can be reduced by mask more tokens. Did I miss anything?
PS. I see the FLOPS is calculated by the length of tokens retented at each layer which is counted during inference.
Do you have inference latency metrics?
Hey, there.
After I read the code, I am confused that the computation cost can be reduced by mask more tokens. Did I miss anything?
PS. I see the FLOPS is calculated by the length of tokens retented at each layer which is counted during inference.
Do you have inference latency metrics?