I have 2 vectors $ a, b \in \mathbb{R}^{k} $, which can be thought of as feature vectors in machine learning.
By simple transformation (linear, affine, concatenate...), I want to combine them into a vector $ c \in \mathbb{R}^{l}, l < k $ and keep the most information.
If $ l \geq 2k $, then I think I can just concatenate them. But if $ l < k $, what should I do?
- Should I concat them, then multiply by a matrix to reduce dimension to $ l $?
- Or should I multiply each of them by a matrix to reduce dimension to $ l/2 $, then concat the result?
- Or should I just multiply each of them by a matrix to reduce dimension to $ l $, then averaging the result?
Or is it no matter what order I do?