Attention Is All You Need

 

Attention Is All You Need: The Core Idea of the Transformer




    In this blog post, I will be discussing the most revolutionary paper of this century “Attention Is All You Need” by (Vaswani et al.). First I will cover the self-attention mechanism and then move towards the architectural details of the Transformer. In the previous blog post From Seq2Seq to Attention: Revolutionizing Sequence Modeling, I discussed the origin of attention mechanism and Bahdanau attention. In this blog, I will be building upon the previous information. So if you haven’t checked out the previous post, go check it out. Bahdanau attention model uses 2 RNNs and an attention mechanism to assign weights to the encoder’s hidden states. In the “Attention is all you need” paper, the authors have gotten rid of all the RNNs. They have introduced a new architecture that does not use recurrence instead it totally relies on the self-attention mechanism. Let me explain what self-attention mechanism is:

Self-attention mechanism enables the model to capture dependencies between different positions within a sequence by attending to all positions simultaneously. In the previous blog, we discussed the use of query and key-value pairs to calculate attention scores. The attention scores determine the importance or relevance of each key-value pair to the given query. Self-attention extends this mechanism to operate within a single sequence without requiring external inputs.

Post a Comment

0 Comments