You got tons of answers but Sami Ben Romdhane's post is worth expanding upon, in my opinion, for if you've just started studying linear algebra, this approach will be relatively new.
I don't know how familiar you are with terms, so I will assume you know what $\mathbb{R}^3$ is and what matrices and determinants are, and of course the standard scalar and cross product in $\mathbb{R}^3$.
If $A$ is some sort of function between vector spaces, for example between $\mathbb{R}^n$ and $\mathbb{R}^m$, in this case, we call this function linear, if for every $x,y\in\mathbb{R}^n$ and every $\alpha,\beta\in\mathbb{R}$, it is true that $$ A(\alpha x+\beta y)=\alpha A(x)+\beta A(y). $$
As you can see, this is rather similiar to distributivity.
Let us define volume before area. Let $u,v,w\in\mathbb{R}^3$ be vectors, and let $\mathrm{vol}$ be a function that takes in three vectors and returns the (oriented) volume of the paralelepiped they span, so $$ \mathrm{vol}:\mathbb{R}^3\times\mathbb{R}^3\times\mathbb{R}^3\rightarrow\mathbb{R},\ (u,v,w)\mapsto\mathrm{vol}(u,v,w). $$
What sort of properties does this function have?
You can check by elementary geometric methods, that the volume of a paralelepiped must be linear in its three independent edges separately, if you allow the volume to be oriented. Here oriented means that it can take on negative values, and if you exchange two of its sides, it must change signs.
This determines most properties of this $\mathrm{vol}$ function's properties. The volume function, $\mathrm{vol}(u,v,w)$ must be linear in all three of its variables separately, and must be so-called "skew-symmetric", which means that if you swap any two of its variables, it will change sign, so $$ \mathrm{vol}(u,v,w)=-\mathrm{vol}(v,u,w). $$
This alone does not determine the volume function exactly, but if $$ e_1=(1,0,0),\ e_2=(0,1,0),\ e_3=(0,0,1) $$ are the standard basis vectors of $\mathbb{R}^3$, you know that this is an orthonormal basis, meaning that all of these are of unit length and any different two of them are orthogonal to one another, plus we take this $\{e_1,e_2,e_3\}$ set to be of positive orientation in this order.
This means that $e_1,e_2,e_3$ basically span the "unit cube", so we want $$ \mathrm{vol}(e_1,e_2,e_3)=1. $$
These properties now uniquely determine the volume function. If $$ u=u_1e_1+u_2e_2+u_3e_3 \\ v=v_1e_1+v_2e_2+v_3e_3 \\ w=w_1e_1+w_2e_2+w_3e_3,$$ then the value of $\mathrm{vol}$ on them is given by $$ \mathrm{vol}(u,v,w)=\mathrm{vol}\left(\sum_{i=1}^3u_ie_i,\sum_{j=1}^3v_je_j,\sum_{k=1}^3w_ke_k\right), $$ and by linearity, this is $$ \sum_{i,j,k=1}^3u_iv_jw_k\mathrm{vol}(e_i,e_j,e_k), $$ and we know that $\mathrm{vol}(e_1,e_2,e_3)=1$, but we also know that by the skew-symmetric property, $\mathrm{vol}(e_2,e_1,e_3)=-1$ and $\mathrm{vol}(e_1,e_1,e_3)=0$, well, basically, the value of $\mathrm{vol}$ on $e_i,e_j,e_k$ is $1$ if $i,j,k$ is an even permutation of $1,2,3$, 0 if any of the indices are repeated, and -1 if $i,j,k$ is an odd permutation of $1,2,3$.
(If you have ever encountered the so-called Levi-Civita symbol, $\epsilon_{ijk}$, then this is its definition, so $\mathrm{vol}(e_i,e_j,e_k)=\epsilon_{ijk}$.)
Because of this, instead of summing on indices, we can sum on permutations, let $S_3$ be the symmetric group of $(1,2,3)$, so the set of all permutations of $(1,2,3)$, then $$ \sum_{i,j,k=1}^3u_iv_jw_k\mathrm{vol}(e_i,e_j,e_k)=\sum_{\pi\in S_3}\mathrm{sgn}(\pi)u_{\pi(1)}v_{\pi(2)}w_{\pi(3)}\mathrm{vol}(e_1,e_2,e_3)= \\ =\sum_{\pi\in S_3}\mathrm{sgn}(\pi)u_{\pi(1)}v_{\pi(2)}w_{\pi(3)}, $$ where $\mathrm{sgn}(\pi)$ is the sign of the permutation $\pi$, and as you can see what we got here is exactly the definition of the determinant whose rows (or columns) are the vectors $u,v$ and $w$.
So $$ \mathrm{vol}(u,v,w)=\det(u,v,w)=\begin{vmatrix}u_1&u_2&u_3\\v_1&v_2&v_3\\w_1&w_2&w_3\end{vmatrix} .$$
Now, as we have noted, the volume function, thus the determinant is linear in all of its variables separately, so let $u,v\in\mathbb{R}^3$ fixed vectors and let $x$ be a variable vector, then $$ \det(u,v,x) $$ is linear in $x$, therefore the rule $x\mapsto \det(u,v,x)$ is a linear function from $\mathbb{R}^3$ to $\mathbb{R}$, because it maps the vector $x$, to a number $\det(u,v,x)$.
It is a linear algebraic theorem that you are probably unfamiliar with, that if $V$ is a (finite-dimensional) vector space that has a scalar product on it (like $\mathbb{R}^3$), then the set of all number-valued linear functions that are defined on $V$ can be identified with the set $V$ itself.
In our example, that means that if $\omega:\mathbb{R}^3\rightarrow\mathbb{R}$ is a linear function, that maps a vector to a number, then there is also such an $y_\omega\in\mathbb{R}^3$ vector, that $\omega(x)=y_\omega\cdot x$ for any $x$ vector.
Our linear function in this case is the map $x\mapsto\det(u,v,x)$, which is also dependant on $u$ and $v$. Let us denote the unique vector which corresponds to this linear map as $u\times v$. Then by our prior theorem $$ \det(u,v,x)=(u\times v)\cdot x, $$ and since the determinant is linear and skew-symmetric in $u$ and $v$ too, this $u\times v$ vector must be linear and skew-symmetric in both vectors as well.
This $u\times v$ vector is what we call the cross product of $u$ and $v$.
Now, why is this orthogonal to $u$ and $v$?
We see that the cross product of $u$ and $v$ is that vector, that when scalar producted with an arbitrary vector, the result of the scalar product is the volume of the three vectors ($u,v$ and the arbitrary vector). We know that the scalar product is the largest when the two vectors you take the scalar product of are parallel, and the scalar product of two orthogonal vectors is null.
We also know that the volume of a paralelepiped is the largest if a designated "third" vector is orthogonal to the base (which is spanned by the "first two" vectors), and zero, if the "third" vector is parallel with the base.
Putting the two together, the $\det(u,v,x)$ expression for equal lengths, but different angles will be the largest, when $x$ is orthogonal to the plane spanned by $u$ and $v$, but $(u\times v)\cdot x$ will be the largest, if $u\times v$ is parallel with $x$, but the two expressions are the same, so the cross product vector is orthogonal to the two vectors that it results from.