Each of the geometric random variables $X_1,X_2,\dots,X_r$ can take on any integer value $\ge 1$, so their sum $Z$ can take on any integer value $\ge r$.
For the formula for $\Pr(Z=k)$, we will assume that you are familiar with the binomial distribution.
The random variable $Z$ is the sum of independent identically distributed random variables with geometric distribution. So $Z$ is the "waiting time" (total number of trials) until the $r$-th success.
The $r$-th success occurs on the $k$-th trial if and only if (i) we had exactly $r-1$ successes in the first $k-1$ trials and (ii) we have a success on the $k$-th trial.
By properties of the binomial distribution, the probability of (i) is $\binom{k-1}{r-1}p^{r-1} (1-p)^{(k-1)-(r-1)}$.
And of course the probability of (ii) is $p$.
Multiply, and clean up a bit. We obtain the formula quoted in the post.
Remark: The third question has to do with the binomial, not the negative binomial. We want to evaluate the sum
$$\sum_{k=0}^n k\binom{n}{k}p^k(1-p)^{n-k},\tag{1}$$
which is the expression for the mean of the binomial. Alternately we could sum from $k=1$, since the $k=0$ term makes no contribution.
For $k\ne 0$, we have the identity
$$\binom{n}{k}=\frac{n}{k}\binom{n-1}{k-1}.$$
This identity can be verified by expressing the binomial coefficients on each side in terms of factorials. Using this identity, we find that the sum (1) is equal to
$$n\sum_{k=1}^n n p^k(1-p)^{n-k}.$$
This is
$$np\sum_{i=0}^{n-1} p^i q^{n-1-i},$$
which is $np$.
I don't like the above argument, it is messy. More importantly, it is computational, not *conceptual.
I would much rather say that the binomial is a sum $B_1+\cdots+B_n$ of independent Bernoulli. Each of the Bernoulli has expectation $p$. So by the linearity of expectation $E(B_1+\cdots+B_n)=E(B_1)+\cdots +E(B_n)=np$.