A more compute-optimal 70B model, called Chinchilla, is trained on 1.4 trillion tokens. Not only does Chinchilla outperform its much larger counterpart, Gopher, but its reduced model size reduces inference cost considerably.
references:
https://sh-tsang.medium.com/brief-review-chinchilla-training-compute-optimal-large-language-models-7e4d00680142
No comments:
Post a Comment