0

As we most know that, Attention is focuses on specific parts of the input sequence those are most relevant in generating output sequence.

Ex: The driver could not drive the car fast because it had a problem.

  1. how attention finds the specific parts(here it is 'it') in input and how it will assign score for the token?

  2. is attention context- based model?

  3. how to obtain attention maps (query, key, value)?

  4. On what basis attention assigns higher weights to input tokens?

tovijayak
  • 67
  • 6

1 Answers1

0
  1. Neural networks are considered black boxes because they are not interpretable: we don't know why they compute the results they do.

  2. Yes.

  3. To get attention maps you can you the BERTViz library, e.g.:

    enter image description here

  4. Same answer as 1).

noe
  • 26,410
  • 1
  • 46
  • 76
  • There have been big advances in interpretability, it's no longer fair to call them strictly black boxes. See Neel Nanda's derivation of the modular arithmetic algorithm from a transformer's weights with a linear probe. Additionally, tools like RASP/TRACR exist with which we can directly program discrete algorithms into transformer weights and analyze them. – Andy Jun 21 '23 at 17:07
  • @Andy do you think any of those studies shed any light on the OP's question? – noe Jun 21 '23 at 18:22
  • Absolutely! They directly address the common misconception that attention is actually "what to focus on", which informs how points 1 and 4 are answered. RASP/TRACR especially helps understand those because you directly control what's happening. – Andy Jun 21 '23 at 18:27
  • 1
    Then by all means write an answer describing it. Personally, I think that most questions regarding the interpretation of the results of a deep neural net are asked from a wrong conception of how neural networks work and, therefore, I like to provide a somewhat generic answer describing how neural networks are trained end-to-end and are not necessarily interpretable, rather than providing a nuanced but not-very-useful overview of the advances of neural network interpretability, because I think that's not what these questions are seeking or needing. But others are free of thinking otherwise, oc! – noe Jun 21 '23 at 18:35
  • 1
    That's fair. I do intend to start addressing points 1 and 4 with an answer later when I'm home, as you note it's not going to be useful without some care and I need to catch myself up on the progress on TRACR as I haven't touched it since January. I do think we're getting closer to stronger interpretability and have some experiments I'd like to try for algorithm discovery and recovery (not in the AlphaTensor sense), hopefully in the near future we'll have nice resources to link beginners to with some intuitive theory. – Andy Jun 21 '23 at 18:42