Here's the question:
You have a dataset $\{\mathbf{x}\}$ of N vectors, each of which is d-dimensional. Assume $\mathbf{mean}(\{x\})=0$. Consider a linear function on our dataset, some vector $\mathbf{a}$, which we can write as being evaluated on each data item as $f_i(\mathbf{a})=\mathbf{a}^T\mathbf{x}_i$.
Show that
"Maximize $\mathbf{Var}(\{f(\mathbf{a})\})$ subject to $\mathbf{a}^T\mathbf{a} = 1$"
is solved by the eigenvector of $\mathbf{Covmat}(\{f(\mathbf{a})\})$ corresponding to the largest eigenvalue.
What I know/understand:
For $\mathbf{a}^T\mathbf{a} = 1$ to be true, a must be an orthogonal matrix. (Rusty on my matrix properties, so still trying to figure out how this might help me)
$\mathbf{Var}(\{f(s *\mathbf{a})\}) = s^2*\mathbf{Var}(\{f(\mathbf{a})\})$, something I proved in a previous question. I'm pretty sure this is supposed to help, but I haven't figured out the connection yet.
I believe this can be maximized using Lagrange, where $f=\mathbf{Var}(\{f(\mathbf{a})\})$, $g=\mathbf{a}^T\mathbf{a}$, and $c=1$, giving us a LaGrange equation of $L = \mathbf{Var}(\mathbf{a}^T\{\mathbf{x}\})-\lambda(\mathbf{a}^T\mathbf{a})$
I understand that I need to set the Lagrangian equation equal to zero and take the gradient in order to maximize the given functions. However, it has been some time since I have done any calculus/linear algebra, so I am not fully sure how to go about doing this, especially in such a general sense where our linear function is just an arbitrary matrix, $\mathbf{a}$.
I believe I am really close to piecing this together and it would be really helpful if someone could help me go in the right direction. Thanks!