I am reviewing some course material where the lecturer suggests that instead of guessing the learning rate parameter in gradient descent implementation, one could use the inverse of the Hessian multiplied by the negative of the Jacobian, to determine the step-size.
Any help with the intuition behind using the inverse of the Hessian would be much appreciated.