For me, a quaternion is a scalar plus a 3D vector. These can be called the real and imaginary parts.
Perpendicular vectors anticommute (i.e. $\mathbf{uv}=-\mathbf{vu}$) and parallel vectors commute. Scalars are central; they commute with everything. The square roots of $1$ are scalars $\pm1$ and the square roots of $-1$ are precisely the unit 3D vectors. The absolute value of a quaternion $p=a+\mathbf{v}$ is $|p|=\sqrt{a^2+\|\mathbf{v}\|^2}$. This is multiplicative, i.e. $|pq|=|p||q|$ for all $p,q$. Every nonzero quaternion has a polar form $r=e^{\theta\mathbf{u}}=r(\cos\theta+\sin\theta\mathbf{u})$ where $r=|p|$ is the magnitude, $\theta$ is a convex angle, and $\mathbf{u}$ is $p$'s normalized imaginary part. (This is unique except if $p$ is one of $-1,0,1$.)
Incidentally, how should $pqp^{-1}$ be interpreted?
If $\mathbf{v}$ is a 3D vector and $p=re^{\theta\mathbf{u}}$ then $p\mathbf{v}p^{-1}$ is $\mathbf{v}$ rotated around the oriented $\mathbf{u}$-axis by $2\theta$. Note there is some redundancy: $r$ doesn't matter and $\pm p$ effect the same rotation. Thus $pqp^{-1}$ will have the effect of rotating $q$'s imaginary part by $2\theta$ around $p$'s imaginary part (as an axis).
4D rotations, interpreted as functions which turn quaternions into quaternions, are all of the form $x\mapsto axb$ for unit quaternions $a$ and $b$, with some redundancy: $(a,b)$ and $(-a,-b)$ effect the same rotation.
Given any two unit quaternions $p$ and $q$, there will be infinitely many 4D rotations that turn $p$ into $q$ or vice-versa. The ones which are left or right multiplications are unique, though: for mapping $p$ to $q$ they are $x\mapsto (qp^{-1})x$ or $x\mapsto x(p^{-1}q)$, or for $q$ to $p$ they are $x\mapsto (pq^{-1})x$ or $x\mapsto x(q^{-1}p)$, respectively. In some sense, these are not "minimum energy," or the most efficient, rotations that do this (but nor are they quite the "maximum energy" ones either...).
The "minimum energy" rotation turning $p$ into $q$ is "halfway" between these, however: it is given by the expression $x\mapsto \sqrt{qp^{-1}}x\sqrt{p^{-1}q}$, where square roots are taken by halving convex angles in polar form, which I discuss in my answer here.
In 3D we can think of the "energy" of a rotation as the size of the (convex) angle it rotates by - the bigger the angle the greater the energy. Given two 3D unit vectors $\mathbf{u}$ and $\mathbf{v}$ (distinct, not antipodal), the "maximum energy" rotation is by $180^{\circ}$ around midpoint between them, and "minimum energy" rotation is by the angle between them around the axis perpendicular to the plane they span.
In 4D, every rotation is by two angles in two orthogonal planes (uniquely determined if the convex angles are distinct), and the "energy" is an increasing function of both angles. More precisely, we can measure the "energy" of a rotation matrix $R$ as $\|R-I\|^2$ where $I$ is the identity matrix and we use the Frobenius norm $\|A\|^2=\mathrm{tr}(A^TA)=\sum |a_{ij}|^2$.