How to reset entropy context -- avoiding entropy "drifting"?

In $\S4.4$ of the paper, it says `we reset the entropy context with new lines and use
approximate monontonicity constraint as it suffers less from "entropy drift" from changes in context length`.

I'm observing puzzling behaviors which I suspect could be due to how I treat "\n" during preprocessing. When I'm pondering the reason, I re-read the paper and found that entropy drifting might be related.
So I'm wondering about the implementation details of resetting the entropy context.

E.g. do you simply `text.split("\n")` before feeding the text to entropyLM -- this would incur lots of padding and de-padding? Or are there any other tricks? 

Thank you in advance!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to reset entropy context -- avoiding entropy "drifting"? #135

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

How to reset entropy context -- avoiding entropy "drifting"? #135

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions