Byte Pair Encoding Python

LBPE: Long-token-first Tokenization to Improve Large Language Models

Abstract: The prevalent use of Byte Pair Encoding (BPE) in Large Language Models (LLMs) facilitates robust handling of subword units and avoids issues of out-of-vocabulary words. Despite its success, ...

IEEE

Tibetan Speech Recognition Based on Hierarchical Transfer Strategy and Subword Modeling

Abstract: For Tibetan speech recognition, the accuracy is often limited due to several challenges, including the heavy reliance of end-to-end systems on large amounts of annotated data, inconsistent ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

LBPE: Long-token-first Tokenization to Improve Large Language Models

Tibetan Speech Recognition Based on Hierarchical Transfer Strategy and Subword Modeling

Trending now