Temperature-scaling surprisal estimates improve fit to human reading times – but does it do so for the “right reasons”?

Code for the paper Temperature-scaling surprisal estimates improve fit to human reading times – but does it do so for the “right reasons”? (ACL 2024 long paper).

Installation

To get started, install the package:

git clone https://github.com/TongLiu-github/TemperatureSaling4RTs.git
cd TemperatureSaling4RTs
pip install -r requirements.txt

How to run

GPT2 s/m/l/xl on Natural Stories/Brown corpora:

sh run.sh

Results store in ./PPP_Calculation_{corpus}/surprisals/1000/gpt2_{size}/gpt2_{size}__ PPP_result{K}.txt.

Comment 1: Inside the above script, to calculate the $\Delta_{\mathcal{llh}}$ at $T=1$:

python PPP_calculation.py -n 1000 -data_name ${data_name0} -model_name ${model_name0} -cuda_num "0"  -K 10 -T_optimal 1.0

At $T\geq1$ (e.g., $T \in [1.0, 10.0]$):

python PPP_calculation.py -n 1000 -data_name ${data_name0} -model_name ${model_name0} -cuda_num "0"  -K 0

Comment 2: Core Components of the Temperature-Scaling Code:

Calculate logits, probabilities and labels (utils.py).
Scale logits using temperature (line 230-231 in PPP_calculation.py):

Comment 3: For experiments on Dundee, the procedure remains the same as above, while the data size is larger (and therefore not uploaded to this repository).

Processed Data

We provide processed data for Natural Stories and Brown in ./PPP_Calculation_{corpus}/data/all.txt.annotation.filtered.csv.

BibTeX

@inproceedings{liu-etal-2024-temperature,
    title = "Temperature-scaling surprisal estimates improve fit to human reading times {--} but does it do so for the {\textquotedblleft}right reasons{\textquotedblright}?",
    author = "Liu, Tong  and
      {\v{S}}krjanec, Iza  and
      Demberg, Vera",
    editor = "Ku, Lun-Wei  and
      Martins, Andre  and
      Srikumar, Vivek",
    booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.acl-long.519/",
    doi = "10.18653/v1/2024.acl-long.519",
    pages = "9598--9619"
}

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
PPP_Calculation_Brown		PPP_Calculation_Brown
PPP_Calculation_Natural_Stories		PPP_Calculation_Natural_Stories
figures		figures
models/spm_en		models/spm_en
PPP_calculation.py		PPP_calculation.py
README.md		README.md
data_processing_for_surprisal.py		data_processing_for_surprisal.py
logits_calculation.py		logits_calculation.py
requirements.txt		requirements.txt
run.sh		run.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Temperature-scaling surprisal estimates improve fit to human reading times – but does it do so for the “right reasons”?

Installation

How to run

Processed Data

BibTeX

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Temperature-scaling surprisal estimates improve fit to human reading times – but does it do so for the “right reasons”?

Installation

How to run

Processed Data

BibTeX

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages