If we interpret the determinant of a matrix $D$ which has its only nonzero entries in its diagonal as the signed volume of applying the corresponding linear transformation to the unit cube then it is clear that $\det(D)=\prod_{}^{}{\operatorname{diag}(D)}$, since we are just calculating the volume of an $n$-dimensional cube.
But why is it that the above equation is true even if there are nonzero entries above the diagonal?
If there is only one nonzero entry above the diagonal, say there is an $m$ in $(i, j)$, then –I think– the cube will be tilted so that its edge opposite to the origin moves $m$ units in the $i$th dimension. In this case the volume stays the same as if all entries above the diagonal were zero (it's analogous to tilting a square in two dimensions).
Once I try to imagine how the shape would be transformed if there was more than one entry above the diagonal my imagination breaks.
Would appreciate any help.