Skip to content

confused with the pooling strategy? #5

@rxqy

Description

@rxqy

Hi, I'm confused with the pooling strategy you used here.

For training, you use the avg token

--pooling_strategy avg \

While for evaluation, you are not specifing any pooling flag here,

BeLLM/README.md

Lines 99 to 105 in 9da9269

2) evaluate on STS benchmark
```bash
BiLLM_START_INDEX=31 CUDA_VISIBLE_DEVICES=0 python eval_sts.py \
--model_name_or_path NousResearch/Llama-2-7b-hf \
--lora_name_or_path SeanLee97/bellm-llama-7b-nli \
--apply_bfloat16 0
```

so this should be default value [cls], right?
parser.add_argument("--pooling_strategy", type=str, default='cls')

As for the paper, you mentioned that you used the representative word as the pivot, so this should be the last non-padding token, right? So I'm wondering which token should I use or does it make no difference in a decoder based model like llama?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions