It's a common practice to normalize inputs to the neural Network.
Let's assume we have a vector of activations.
One of techniques, the Layer Normalization simply looks at the vector's components, re-centers this activated vector from $\mu$ to zero, then divides by the standard deviation $\sigma$
How is it then possible to distinguish activations [1,2,3,4] from [4,5,6,7] if both will be re-centered to the same vector [-1.5, -0.5, 0.5, 1.5f] and then divided by std deviation? Also, I can see such a problem when merely normalizing the input-state-vectors for any Neural Net.
Edit:
there seems to be a hint in first half of page 4 in the paper, however due to my weakness in Maths I can't comprehend it :(
Edit after accepted the answer:
guys, don't forget that Layer Norm (and Batch norm) both have Gain and Bias terms. If the network performs poorly, the gain is tweaked to undo division-by-std-deviation, and tweaking the bias is used to undo the shift (the re-centering). This allows some neurons to indeed pay attention to scaling and shifting, when really required.