AutoTokenizer. A tokenizer is responsible for preprocessing text into an array of numbers as inputs to a model. There are multiple rules that govern the tokenization process, including how to split a word and at what level words should be split
AutoClass can help you automatically retrieve the relevant model given the provided pretrained weights/vocabulary. AutoTokenizer is a generic tokenizer class that will be instantiated as one of the base tokenizer classes when created with the AutoTokenizer. from_pretrained() classmethod.
No comments:
Post a Comment