Attention mechanisms are used as part of deep learning to tell an algorithm to focus more on certain aspects of an algorithm than others. Attention mechanisms tell the machine learning models to use more processing power on the parts of the inputted data that have been ascribed higher priority. Attention is often used as part of natural language processing and computer vision, and can be used as part of recurrent neural networks. The most common types of attention mechanisms rely on dot-product attention or multi-head attention.
Dot-product attention looks at the dot product between vectors to determine what needs attention. Dot product attention also considers the hidden states of the encoder and decoder to help determine the alignment score function. Multi-head attention, on the other hand, is the combination of multiple attention methods (possibly including dot-product) to direct the attention of the machine learning algorithm. Rather than relying on a single method of guiding attention, a multi-head approach utilizes many methods to ensure the network is appropriately directed.
Attention mechanics are often used as part of natural language processing in order to speed up processes and ensure that computational time is spent effectively. When processing language, the attention mechanics will often be in the connected neural networks that connects to the decoder. As the input sentence starts to be processed, a number of time steps occur. Oftentimes, the attention unit will be utilized immediately after the input. As part of this step, attention weights will be assigned to the encoders, and then a weighted sum will be sent along to the decoder. Generally, input words in a sentence will be assigned weights, so the first word will have the most attention, then the second, and so on. This is especially important as part of machine translations and fully connected recurrent neural network architecture. Translation services often utilize a sequence to sequence approach, where the output of one step becomes the input context for the next, and decisions regarding output are made based on the weights assigned to the state of the encoder and alignment scores. Attention in a sequence to sequence approach will help ensure that output words follow in an appropriate order.
Attention mechanics are helpful in regard to machine learning as they improve: