Counting has two purposes, namely for specifying sizes and indices. These are directly related for finite quantities, because the number of natural numbers (including $0$) less than $n$ (before the position $n$) is $n$. But in set theory, when generalizing to infinite sets these two notions become different.
$\def\nn{\mathbb{N}}$
$\def\zz{\mathbb{Z}}$
$\def\qq{\mathbb{Q}}$
$\def\rr{\mathbb{R}}$
$\def\eq{\leftrightarrow}$
$\def\less{\smallsetminus}$
$\def\none{\varnothing}$
Sizes
The notion of size is extended to infinite sets in the following way. Take any two sets $X,Y$. We say that $X$ is no larger than $Y$ if we can label each element of $X$ with a unique element of $Y$, in the sense that no two elements of $X$ are given the same label. Mathematically this labelling is called an injection from $X$ into $Y$, and we write $\def\inj{\hookrightarrow}$$X \inj Y$. We say that $X$ is of the same size as $Y$ iff there is a bijection (1-to-1 correspondence) between elements of $X$ and elements of $Y$, and we write $X \approx Y$. It turns out that if $X \inj Y$ and $Y \inj X$ then $X \approx Y$.
Notice that $\nn_{>0} \inj \nn$ via the identity labelling, and $\nn \inj \nn_{>0}$ by labelling each element $n$ of $\nn$ by $n+1$. So $\nn_{>0} \approx \nn$, meaning that they cannot be distinguished in terms of size as defined above. There is no other elegant way to define size comparison to distinguish them, because we want it to be independent of labelling, meaning that we want $X \approx \{ f(x) : x \in X \}$ for any set $X$ and injective function $f$ on $X$.
Indices
The notion of indexing, on the other hand, can be extended in the following way. Indexing is used for a sequence, where order is important. Note that a sequence is nothing more than a function on its indices, where the indices are totally ordered. But if we want to define a sequence recursively by defining each element in terms of all the previous ones, then it is in general not enough for indices to be ordered. Specifically, a recursive definition on $I$ is a 'function' $E$ such that, for every $i$ in $I$ and function $f$ on $I_{<i}$, $E(f)$ is a function on $I_{\le i}$, and we want to build a function $f$ on $I$ such that $f \restriction I_{\le i} = E(f \restriction I_{<i})$ for any $i \in I$ ("$f \restriction D$" denotes "$f$ restricted to the domain $D$"). We cannot always do this if $I$ has a strictly decreasing infinite sequence. For a concrete example, there is no function $f$ on $\zz$ such that $f(r) = \cases{0 & if $\exists s \in \zz_{<r}\ ( f(s)=1 )$ \\ 1 & otherwise}$, despite it looking like a recursive definition. (Exercise: Prove that no such $f$ exists!) Notice that this counter-example can be easily adapted to any $I$ that has some strictly decreasing infinite sequence.
It turns out that, if $I$ has no strictly decreasing infinite sequence, then every recursive definition on $I$ not only will have some sequence satisfying it (in the above sense) but also that sequence is in fact unique! We say that a total order is well-ordered iff if every non-empty set of elements in it has a minimum. Now given any such $I$ and any set $S$ of elements in $I$, if $S$ has no minimum then we can iteratively pick a (countable) strictly decreasing sequence of elements in $S$ [by the axiom of dependent choice]. Therefore $I$ is well-ordered. Now take any recursive definition $E$ on $I$. If $E$ does not 'work', namely it does not uniquely define a sequence on $I$, then let $S$ be the set of all elements $i$ in $I$ such that there is no unique function $f$ on $I_{\le i}$ such that $f$ agrees $E(f \restriction I_{<i})$, and let $m$ be the minimum of $S$ in $I$. Then for every $i \in I_{<m}$ there is a unique function on $I_{\le i}$ satisfying $E$, and all these functions agree, and hence combining them gives a unique function on $I_{<m}$ that satisfies $E$. By definition of $E$ there is a unique function on $I_{\le m}$ that satisfies $E$, contradicting the definition of $m$. Therefore $E$ does 'work', namely there is a unique sequence on $I$ that satisfies $E$.
[Note: If you do not want to use the axiom of dependent choice, then you can still have recursive definitions for well-orders.]
The above is why we use well-orders for indexing, since every well-order has no strictly decreasing infinite sequence. [Technically $E$ need not be a function in ZFC set theory; it can be any definable function.] Also, the above proof technique generalizes to transfinite induction, namely that for any well-order $I$ and statement $Q$ we have $\forall i \in I\ ( \forall j \in I_{<i}\ ( Q(j) ) \to Q(i) ) \to \forall i \in I\ ( Q(i) )$.
Ordinals
Given any two total orders $X,Y$, we say that $X$ embeds into $Y$ if there is an injection $φ$ from $X$ into $Y$ that preserves the ordering, meaning that $a < b \eq φ(a) < φ(b)$ for every $a,b$ in $X$. Also, we say that $X$ is isomorphic to $Y$ if there is a bijection from $X$ to $Y$ that preserves the ordering. It is not too hard to prove that, for any two non-isomorphic well-orders, one of them embeds into the other, and not the other way around. Therefore well-orders are themselves totally ordered up to isomorphism, meaning that they satisfy the conditions for a total order except that equality is replaced by isomorphism.
Now the next natural step is to find a canonical form for well-orders, which we shall call ordinals. Take any well-order $X$. Recursively define the sequence $f$ on $X$ by $f(i) = \{ f(j) : j \in X_{<i} \}$ for each $i \in X$. Then its range $\{ f(i) : i \in X \}$ ordered under set membership $\in$ is isomorphic to $X$! (Exercise: Prove this by using transfinite induction to prove simultaneously that $\{ f(j) : j \in X_{\le i} \}$ and $f(i)$ are well-ordered under $\in$ for every $i$ in $X$.) We use this well-ordering as the (canonical) ordinal for $X$, which we shall denote by $ord(X)$. We will also order ordinals by $\in$ unless otherwise stated.
This also can motivate the von Neumann definition of ordinals as sets that are transitive (every element is a subset) and well-ordered under $\in$. It turns out that every (canonical) ordinal is also a von Neumann ordinal, and every von Neumann ordinal is its own (canonical) ordinal. (Exercise: Prove this. For the first part, first prove by transfinite induction that $f(i)$ is a von Neumann ordinal for every $i$ in $X$.)
Note that no ordinal is an element of itself, otherwise it will have a strictly decreasing infinite sequence. Consequently, there cannot be a set of ordinals $ORD$ in most set theories, because $ORD$ would be a von Neumann ordinal and hence $ORD \in ORD$.
Cardinals
Now we go back to the notion of size. It turns out that we can use some of the ordinals to represent the sizes of infinite sets. Take any set $X$. Let $W$ be the set of all well-orders with elements in $X$, and let $A = \{ ord(w) : w \in W \}$. Then we can prove that $A$ is a von Neumann ordinal. Also there cannot be an injection $f$ from $A$ into $X$, otherwise we can define a well-order $<$ on $S = \{ f(a) : a \in A \}$ by $f(a) < f(b) \eq a \in b$, and then $A = ord(S,<) \in A$.
Given the full axiom of choice, we can prove that $X \approx A_{<i}$ for some $i \in A$ as follows. Recursively define $f(i)$ to be some element in $X \less \{ f(j) : j \in A_{<i} \}$ if there is one and $\none$ otherwise, for each $i \in A$. By definition of $A$, $f$ is not an injection from $A$ into $X$, and hence there is a least $i \in A$ such that $X \less \{ f(j) : j \in A_{<i} \}$ is empty, which implies that $f \restriction A_{<i}$ is an bijection between $A_{<i}$ and $X$. By the well-ordering of ordinals we can define $\#(X)$, the cardinality of $X$, to be the least ordinal such that there is a bijection between $X$ and it. [Without the axiom of choice, we cannot define $f$ and so this definition of cardinality fails, but by the well-ordering of ordinals there is a least ordinal $H$, called a Hartogs number, such that there is no injection from $H$ into $X$, which is also a sort of measure of size.]
The size condition allows us to define recursive sequences on $X$ by ordering its elements according to this well-order. The minimality condition ensures that the partial sequence that we have to extend in the recursive definition is strictly smaller than $X$, which condition may be needed by the recursive definition itself.
For instance we can prove that $\#(S^2) = \#(S)$ for any infinite set $S$ by transfinite induction. Let $k = \#(S)$. Then it suffices to prove that $\#(k^2) = k$. Order the elements of $k^2$ by maximum and then lexicographic order, which can be easily shown to be a well-order. Let $m$ be the ordinal for this well-order, and let $f$ be the isomorphism from $m$ to $k^2$. If $k < m$, then let $a,b \in k$ such that $f(k) = (a,b)$, and let $d = \#(\max(a,b)) \le \max(a,b) < k$. Then $k$ is isomorphic to $\{ f(i) : i \in k \}$ $\subseteq \{ (x,y) : x,y \in k \land x,y \le d \}$ $\approx \#((d⊕1)^2)$ $= \#(d^2⊕d⊕d⊕1)$ $= \#(d⊕d⊕d⊕1) < k$, where "$p⊕q$" denotes a disjoint union of $p$ and $q$, namely $\{ (0,x) : x \in p \} \cup \{ (1,x) : x \in q \}$. The last inequality is because $k$ is infinite, and either $d$ is finite and so $d⊕d⊕d⊕1$ is also finite, or $d$ is infinite and so $\#(d⊕d⊕d⊕1) = d$. Thus $\#(k) < k$, contradicting the definition of $k$. Therefore $k \ge m \ge \#(k^2) \ge \#(k) = k$, and hence $\#(k^2) = k$.
(Exercise: Prove that there is a set $S$ of points in the plane such that every line passes through exactly two points in $S$. Sketch: Let $c = \#(\rr)$. Then there are $\#(\rr^2⊕\rr) = c$ many lines in the plane. We start with $S = \none$ and iterate through the lines indexed by $c$. We maintain the invariance that at each step $i \in c$ there are at most $2i$ many points added so far, which means that there are at most $(2i)^2$ points on the current line $L$ that cannot be added to $S$ without violating the desired property. But $(2i)^2+2 < c$ and $L$ passes through $c$ many points, so we can add (up to two) points to $S$ such that $L$ passes through exactly two points without violating the desired property.)