NTM relies on only the last state

attention is an idea that uses a** weighted combination of input states**

like a memory reference where the access is soft.

attention and memory in deep learning and nlp


Show, Attend and Tell for image labelling

Grammar as a Foreign Language for language parsing

Teaching Machines to Read and Comprehend for QA

End-to-End Memory Networks allow the network to read same input sequence multiple times before making an output, updating the memory contents at each step.

Neural Turing Machines use a similar form of memory mechanism, but with a more sophisticated type of addrdessing that using both content-based (like here) and location-based addressing, allowing the network to learn addressing pattern to execute simple computer programs, like sorting algorithms.

a clearer distinction between memory and attention mechanisms, perhaps along the lines of Reinforcement Learning Neural Turing Machines, which try to learn access patterns to deal with external interfaces.

further reading

2016.1.22 Attention mechanism and application on QA over Knowledge Base

zhihu question