NTM relies on only the last state

attention is an idea that uses a** weighted combination of input states**

like a memory reference where the access is soft.

attention and memory in deep learning and nlp


Show, Attend and Tell for image labelling

Grammar as a Foreign Language for language parsing

Teaching Machines to Read and Comprehend for QA

End-to-End Memory Networks allow the network to read same input sequence multiple times before making an output, updating the memory contents at each step.

Neural Turing Machines use a similar form of memory mechanism, but with a more sophisticated type of addrdessing that using both content-based (like here) and location-based addressing, allowing the network to learn addressing pattern to execute simple computer programs, like sorting algorithms.

a clearer distinction between memory and attention mechanisms, perhaps along the lines of Reinforcement Learning Neural Turing Machines, which try to learn access patterns to deal with external interfaces.

further reading

