If you accept the definition of $A^T$ as the usual easily visualized definition of flipping $A$ along the diagonal and switching its rows and columns, then that equality follows easily from properties of the dot product: $(Ax) \cdot y = (Ax)^T y = (x^T A^T) y = x^T (A^T y) = x \cdot (A^T y)$, where the second equality comes from the following property of transposes: $(AB)^T = B^T A^T$ for two matrices $A$ and $B$, where $B$ could be a vector since we can think of a vector with $n$ entries as a $n \times 1$ matrix. The third equality comes from associativity of matrix multiplication.
As to why Strang introduces that definition in terms of dot products, I would guess it is partly because the “easier” definition of transpose as “flipping a matrix” might be a little casual from a mathematical standpoint. Moreover, this new definition defines the transpose entirely in terms of the dot product, so there is the benefit of being more algebraic and expressed only in terms of one or two basic operations (dot product and matrix multiplication); it doesn’t appeal to the physical idea of flipping a matrix along the diagonal, which is easy to visualize but maybe hard to see how that relates to the algebra of things and to the basic operations used in linear algebra. Both approaches are important and useful.
Also, as a general remark, the dot product is an example of an inner product (and more generally, bilinear form), which if you haven’t encountered already, you will probably encounter if you study more math/linear algebra. Notions like inner product, inner product spaces, bilinear forms are very important and general concepts in math. So, the transpose can be introduced this way, and this approach may seem alien and esoteric, but is actually more general (and rigorous). Anyways, I don't know too much about that, so I won't comment further.
As for the geometric intuition, I’m not sure if I can answer it better than the top-rated answer in the following post: What is the geometric interpretation of the transpose?. That answer references the singular value decomposition. So, you could wait until Chapter 7 of Strang’s book, in which he talks about the SVD, as well as the geometry of it. Of course, there may be a quicker answer that is still satisfying and doesn’t reference the SVD, but I’m not currently aware of one.