We’ve seen a growing number of attempts to augment RNNs with new properties. Four directions stand out as particularly exciting: neural turing machines, attentional interfaces, adaptive computation time and neural programmers.
Individually, these techniques are all potent extensions of RNNs, but the really striking thing is that they can be combined together, and seem to just be points in a broader space. Further, they all rely on the same underlying trick — something called attention — to work.
-