The reason we can change/deform contours is due to a generalized version of Stoke's Theorem. Which has an extremely intuitive explanation here on Wikipedia, see underlying principles. Here is another here. The hand wavy reason it works is because the curl of a complex function is $0$.
Circles are really easy to parametrize so that's why we (generally) integrate about circles rather than..say triangles. This is due to Euler's Identity.
So when we integrate about a curve, $C(\theta)$, we multiply by a differential $dC(\theta)$ such that
$$(1) \quad dC(\theta)=C(\theta+d\theta)-C(\theta)$$
Multiplying and dividing the right hand side of $(1)$ by $d\theta$ yeilds,
$$(2) \quad dC(\theta)= \cfrac{dC}{d\theta} \cdot dt$$
So,
$$(3) \quad \int_C f(C(\theta)) \ dC(\theta)=\int f(C(\theta)) \cdot \cfrac{dC}{d\theta} \ d\theta$$
The meaning of $(3)$ is as follows. The Line Integral of a function $f(x)$ about a curve $C(\theta)$ gives the average value of $f \cdot \cfrac{dC}{|dC|}$ on that curve multiplied by the length of the curve.
If you accept this claim, we can simply substitute the functions in question, however that isn't really the issue is it?
First, let's address a major issue; why does $(3)$ average over $f \cdot \cfrac{dC}{|dC|}$?
It'll take some imagination but a reasonable explanation can be given. Imagine $f(t)$ gives the speed of a walking person at a particular time. If it's positive, (s)he's walking to the right. If it's negative, (s)he's walking to the left. Now imagine that we were viewing this on a TV rather than in real life. If we speed up time, fast forwarded, the person would appear to be walking faster. If we rewinded te person would switch directions. However, if we have a real time $p$ then we can know the time on the TV by making $t$ a function of the real time $p$. Now, if $dt$ is negative, we know that the time on the TV is going backward. However, if the person's speed is still positive, we know that the true speed, relative to the observer at time $p$, is actually negative. So if we want the average speed of the person, it's not enough to watch the TV, you also need to know how time is progressing. That's why $(3)$ can be negative. However, if time progresses normally relative to the observer, the ratio becomes unity.
So, in short $(3)$ averages, but does so with respect to direction. Sadly, explaining imaginary time, would take a lot of real time, so we'll have to settle for the above sentence rather than the time analogy.
However, recall that multiplication by a complex number $v$ rotates the multiplicand $w$ by $arg(v)$ degrees/radians and scales by $|v|$ in the complex plane. For a more in depth explanation and review see here.
So, $\cfrac{dC(\theta)}{|dC(\theta)|}$ gives the normed differential, the direction of the differential, at an angle $\theta$.
Imagine lines running from the origin to a point on $C$. Then imagine another line going from that point then tangent counter-clockwise along the curve. Move this new line and place it's tail at the origin. So, you should see a line representing $C(t)$, then you should also see another line representing the direction of $dC$. Notice that for the complex circle, $90^o$ seperates these lines. Since, multiplication can rotate, the direction of $dC$ is $C$ rotated by $90^o$. Better yet,
$(4) \quad dC=i \cdot C$
$\Rightarrow arg(dC)=arg(C)+\pi/2$
Where we've switched to radians. It's important to realize that it's the difference between the directions, angles, that's key. In symbols,
$(5) \quad \Delta \theta=arg(dC)-arg(C)=\pi/2$
Which is constant. We know that,
$(6) \quad arg(v \cdot w)=arg(w)+arg(v)$
By extension,
$(7) \quad arg \left(\cfrac{w}{v} \right)=arg(w)-arg(v)$
Using the principle of correspondence yeilds,
$(8) \quad arg \left(\cfrac{dC}{C} \right)=arg(dC)-arg(C)$
Thus, if we want $(3)$ to be constant, if we want the angle difference to be constant, we should have $f=\cfrac{1}{C(\theta)}$.
What about $f=\cfrac{1}{C(\theta)^2}$? It rotates faster than $dC$, so the difference between angles won't be constant.
What about $f=\cfrac{1}{C(\theta)^0}$? This time it rotates to slowly for the difference to be constant.
Finally, putting everything together,
$$(9) \quad \cfrac{1}{2 \cdot \pi} \cdot \int_{C_r} \cfrac{dC}{C-z_0}=i$$
This says that the average angle between $dC$ and $\cfrac{1}{C-z_0}$ in the complex plane is $\pi/2$ radians or $90^o$ degrees. Which in this case is best represented as $i$.
With this inuition it's now easy to see how Cauchy's Integral Formula works in general.
(Reproduced with permission from here)