Formalizing BPE Tokenization

Berglund, M; van der Merwe, B

Berglund, M (通讯作者),Umea Univ, Dept Comp Sci, Umea, Sweden.

ELECTRONIC PROCEEDINGS IN THEORETICAL COMPUTER SCIENCE, 2023; 388 (): 16

Abstract

In this paper, we formalize practical byte pair encoding tokenization as it is used in large language models and other NLP systems, in particular we f......

Full Text Link