In the context of my research, this means that the RNN can understand the whole sentence, looking at multiple words at once and how they come together.
However, as the layers are stacked, the last layers gradually lose the ability to consider earlier layers and end up ignoring them. This is commonly known as the “Vanishing Gradient Problem” [https://en.wikipedia.org/wiki/Vanishing\_gradient\_problem]
Luckily this is a problem that Google has invested time and money into solving, and they have created a more complex architecture called a Transformer. This adds another layer to an RNN. This layer looks at preceding words or sentences and can determine what words or concepts are essential. For more detail, I recommend the article “The Illustrated Transformer” by Jay Alamar. [https://jalammar.github.io/illustrated-transformer/]