How to use the tokenizer?

Thanks for releasing the model on Huggingface.

I wanted to use the text encoder. For that I need to tokenize the input. But how to use the tokenizer? Can we use it from the CLIPprocessor?

 processor = CLIPProcessor.from_pretrained("vinid/plip")
 tokenizer = processor.tokenizer


But with this, the max_model_length is insanely high value 1000000000000000019884624838656.
So I was wondering if this is the correct use.

CLIPTokenizerFast(name_or_path='vinid/plip', vocab_size=49408, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|startoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '<|endoftext|>', 'pad_token': '<|endoftext|>'}, clean_up_tokenization_spaces=True),  added_tokens_decoder={
	49406: AddedToken("<|startoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
	49407: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use the tokenizer? #16

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

How to use the tokenizer? #16

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions