In $\S4.4$ of the paper, it says we reset the entropy context with new lines and use approximate monontonicity constraint as it suffers less from "entropy drift" from changes in context length.
I'm observing puzzling behaviors which I suspect could be due to how I treat "\n" during preprocessing. When I'm pondering the reason, I re-read the paper and found that entropy drifting might be related.
So I'm wondering about the implementation details of resetting the entropy context.
E.g. do you simply text.split("\n") before feeding the text to entropyLM -- this would incur lots of padding and de-padding? Or are there any other tricks?
Thank you in advance!!
In$\S4.4$ of the paper, it says
we reset the entropy context with new lines and use approximate monontonicity constraint as it suffers less from "entropy drift" from changes in context length.I'm observing puzzling behaviors which I suspect could be due to how I treat "\n" during preprocessing. When I'm pondering the reason, I re-read the paper and found that entropy drifting might be related.
So I'm wondering about the implementation details of resetting the entropy context.
E.g. do you simply
text.split("\n")before feeding the text to entropyLM -- this would incur lots of padding and de-padding? Or are there any other tricks?Thank you in advance!!