Abstract: The introduction of transformer models, which utilize a self-attention mechanism within deep neural networks, represents a notable breakthrough in natural language processing. This ...