Here is some progress in your solution:
First, note that this problem, in fact, has a solution since the unit sphere is compact and the objective function is continuous. Second, note that the problem is not convex since the restriction $\|a\|=1$ is not affine. On the other hand, the problem is equivalent to
$$min \sum_{i=1}^n(y_i -a^tx_i)^2,\;\; s.t\; \|a\|_2^2-1=0. $$ The advantage of this formulation is that it will make the constraint smooth and will simplify a lot of computation. Since the problem is nonconvex and the LICQ is satisfied at any feasible point, the most we can do is to find all KKT points and choose, among those, the one at which the function $$f(a)= \sum_{i=1}^n (y_i-a^Tx_i)^2$$ attains its minimum value. We proceed as follows:
The lagrangian in this case is
$$L(a,\lambda)= f(a)+\lambda(\|a\|_2^2-1),$$ from which we find that $a$ is KKT iff
$$0= \sum_{i=1}^n-2(y_i-a^Tx_i)x_i +2\lambda a.$$
Now we introduce some notation:
$$X=[x_1,\ldots, x_n],\; A= \sum_{i=1}^{n} x_ix_i^T,\; y= (y_1,\ldots,y_n)^T,\; v=Xy, \; b=A^{-1}v,\; b'=A^{-1}b.$$ Whith this in mind, the KKT condition is just
$$\lambda a= \sum_{i=1}^ny_ix_i -\sum_{i=1}^n(a^tx_i) x_i= Xy- Aa=v-Aa, $$ or equivalently, $$(A+\lambda I)a= v.$$ We only analyze the following case:
$\textbf{Case:}\;-\lambda \notin \sigma(A)$(the set of eigenvalues of A)
Hence $A+\lambda I$ is invertible. Furthermore, its inverse is given by
$$(A+\lambda I)^{-1} =A^{-1} - g(\lambda) A^{-2},$$ where $$g(\lambda)=\frac{\lambda}{1+\lambda \;Tr(A^{-1})}$$ (See this page and this page). So, we have in this case
$$a=(A+\lambda I)^{-1} v= A^{-1}v - g(\lambda) A^{-2}v= b- g(\lambda)b'. $$
But now we use the constraint $\|a\|_2^2=1$ to find $\lambda.$ We should have then
$$\|b\|^2_2- 2g(\lambda)b^Tb' +g^2(\lambda)\|b'\|_2^2=1,$$ from which
$$(\|b'\|_2^2)g^2(\lambda) +(-2b^Tb') g(\lambda)+(\|b\|^2_2-1)=0.$$ The trick now is to note that this is a quadratic equation in $g(\lambda).$ The discriminant of this equation is
$$D= 4((b^Tb')^2- \|b'\|_2^2(\|b\|2^2-1)).$$ If $D<0,$ we conclude that there are not KKT points in this case. Otherwise, we will have
$$g(\lambda)= \frac{b^Tb'+_{-}\sqrt{(b^Tb')^2- \|b'\|_2^2(\|b\|2^2-1)}}{\|b'\|_2^2}.$$
Denote $$r_{+}=\frac{b^Tb'+ \sqrt{(b^Tb')^2- \|b'\|_2^2(\|b\|2^2-1)}}{\|b'\|_2^2},\; r_{-}=\frac{b^Tb'- \sqrt{(b^Tb')^2- \|b'\|_2^2(\|b\|2^2-1)}}{\|b'\|_2^2} $$
This is of course the same as
$$\frac{\lambda}{1+\lambda \;Tr(A^{-1})}= r_{+},r_{-}.$$ From here we find that
$$\lambda_{+}= \frac{r_{+}}{1-r_{+}Tr(A^{-1})} \textrm{ and } \lambda_{-}= \frac{r_{-}}{1-r_{-}Tr(A^{-1}}$$ solve the equation, provided that $r_{+}\neq \frac{1}{Tr(A^{-1})}$ and $r_{-}\neq \frac{1}{Tr(A^{-1})}$ respectively. If one of this cases holds, we miss that solution. In conclusion, the only possible values of $\lambda$ so that the associated $a$ is $KKT$ are as above under the conditions $D\geq 0$ and $r_{+},r_{-}\neq \frac{1}{Tr(A^{-1})}.$
For the remaining case ($-\lambda \in \sigma(A)$) I have some ideas also. However, I don't think that a closed solution is possible. If I have some more time, I will think about it. Hope this helps.