$1)$ In general, if a function $f:X\rightarrow Y$ is continuous and there is a sequence of points, $x_1,x_2,x_3\dots\in X$ that converges to $a\in X$, then the sequence $f(x_1),f(x_2),f(x_3)\dots$ in $Y$ converges to $f(a)\in Y$ (this is fairly easy to prove).
$2)$ Another easy-to-prove fact is that, if a sequence $x_1,x_2,x_3\dots\in X$ converges to $w\in X$ and a sequence $y_1,y_2,y_3,\dots\in Y$ converges to $z\in Y$ (where $X$ and $Y$ are arbitrary topological spaces), then the sequence $(x_1,y_1),(x_2,y_2),(x_3,y_3),\dots\in X\times Y$ converges to $(w,z)\in X\times Y$.
$3)$ Lastly, it is a standard fact that, in general, if $(X,d)$ is any metric space, then the function $d:X\times X\rightarrow \mathbb R$ is continuous (you can find a proof in the accepted answer here).
These are the only main facts required.
To start with, suppose (for some fixed $x\in M$) that $T^{a_1}(x),T^{a_2}(x),T^{a_3}(x)\dots$ is a convergent subsequence of $T(x),T^2(x),T^3(x),\dots$ Then, we set $x^*$ equal to the value to which $(T^{a_i}(x))_{i\in \mathbb N}$ converges.
Now, given any $a\in M$ and any $\epsilon>0$, if we set $\delta=\epsilon$, we have that whenever $d(a,b)<\delta$, $d(T(a),T(b))<d(a,b)<\delta=\epsilon$. This means $T$ is continuous.
Using fact $1)$, we then get that $T^{a_1+1}(x),T^{a_2+1}(x),T^{a_3+1}(x)\dots$ converges to $T(x^*)$ and that $T^{a_1+2}(x),T^{a_2+2}(x),T^{a_3+2}(x)\dots$ converges to $T^2(x^*)$.
This, in combination with fact $2)$ means that the sequence $(T^{a_i}(x),T^{a_i+1}(x))_{i\in \mathbb N}\in M\times M$ converges to $(x^*,T(x^*))$ and that the sequence $(T^{a_i+1}(x),T^{a_i+2}(x))_{i\in \mathbb N}\in M\times M$ converges to $(T(x^*),T^2(x^*))$.
Finally, if we use both facts $1)$ and $3)$ together with the point established above, we have that the sequence $d(T^{a_i}(x),T^{a_i+1}(x))_{i\in \mathbb N}$ converges to $d(x^*,T(x^*))$ and that the sequence $d(T^{a_i+1}(x),T^{a_i+2}(x))_{i\in \mathbb N}$ converges to $d(T(x^*),T^2(x^*))$.
However, the sequences $d(T^{a_i}(x),T^{a_i+1}(x))_{i\in \mathbb N}$ and $d(T^{a_i+1}(x),T^{a_i+2}(x))_{i\in \mathbb N}$ are both subsequences of $d(T^i(x),T^{i+1}(x))_{i\in \mathbb N}$ which (as you have already established) is a sequence that converges in $\mathbb R$.
Also, a sequence in $\mathbb R$ cannot converge to more than one point.
Conclude that $d(x^*,T(x^*))=d(T(x^*),T^2(x^*))$.