The shortest path between two points in the plane is a segment of the straight line between them. This line segment naturally forms a diagonal of a rectangle with those two points as its vertices. So the more pressing question is, why is a straight line the shortest path? (And why do we measure distance the way we do [ie, with "Euclidean distance"]?)
The fact that straight lines minimize distance was observed by Archimedes. You could loosely justify it by observing that any path that deviates from a straight line will have to take a detour through a third point; rather than running from $A$ to $B$, the path would run from $A$ to $X$ to $B$. And that detour must bulge the path out and form a shape roughly akin to a triangle. But the sum of two sides of a triangle should not be longer than the third side, due to the "triangle inequality", which we ordinarily take as a given in geometry, or which we can prove algebraically (using some understanding of vectors). So the shortest path should follow only one edge and be a straight line. But perhaps appealing to the triangle inequality is just begging your original question. So we could instead prove it with more rigour using more advanced techniques from calculus of variations. See for example this proof and this MSE page.
As to why we use use Euclidean distance, a big factor is rotation invariance. Physically, we expect that taking one step, we will move exactly as far, regardless of the angle we're facing. A one meter step north should go equally far as a one meter step northeast. This property holds for the Euclidean distance but not for the taxicab metric (ie, horizontal plus vertical).
The thing that's throwing me off is that if you imagine an infinitesimally small step in the x-direction and then imagining an infinitesimally small step in the y-direction and then do this an infinite amount of times then it would seem like the same distance as the diagonal path.
The fallacy here is that you are failing to properly apply concepts of convergence (ie, a strict mathematical sense of "turning into [something]"). In particular, convergence in path does not necessarily mean convergence in path length. This "$\pi=4$" question linked in the comments is a good place to see how this could lead you astray. Observe that the total length depends both on the size of the summands and also how many of the summands there are. The smaller the pieces of your curve become, the more pieces you get! These rates may be matching and balance out, so the total distance of the curve may never change. Moral of the story: you should actually compute the distance of the curve rather than appealing to blind intuition.