First, prove that the dot product is distributive, that is:
$$(\mathbf{A}+\mathbf{B})\cdot\mathbf{C}= \mathbf{A}\cdot\mathbf{C}+\mathbf{B}\cdot\mathbf{C}\tag{1}$$
You can do this with the help of the "parallelogram construction" of vector addition and basic trigonometry.
It is plain sailing from here. We use (1) to express the two vectors in a dot product as the superposition of basis vectors. Then the orthogonality of the basis vectors (which holds by definition) eliminates cross terms of the form $A_j\,B_k$ where $k\neq j$.
We can also note here that the Gram-Schmidt procedure guarantees that we can always choose a basis which is orthogonal with respect to a given inner product. Which brings us to the second way of looking at the problem.
We begin with an abstract inner product, i.e. a symmetric, billinear (so that it is distributive by definition) scalar product which is always strictly positive for nonzero vectors and nought if either vector is nought. From the inner product definition along, one can prove the Cauchy-Schwarz inequality, and from there we can show that the range of possible values of
$$c\stackrel{def}{=}\frac{\mathbf{A}\cdot\mathbf{B}}{|\mathbf{A}|\,|\mathbf{B}|}$$
is $[-1,\,1]$ and that all values in this interval can be realized. Therefore, there is no problem in definining $c$ to be a cosine of an angle, since $\arccos(c)$ will always exist and be real.