1

Suppose we have a dataset $\{x^{(i)}, y^{(i)}\}_{i = 1}^N$ where $x^{(i)} \in \mathbb{R}^n$ and $y^{(i)} \in \{0, 1\}$ for simplicity.

Our main goal is to apply some unsupervised learning algorithm on $x^{(i)}$ and interpret the results, which we can call $u^{(i)}$. I have in mind applications where the unsupervised algorithm is designed to infer something intangible and unobservable, but nonetheless having a meaningful physical interpretation, ICA as a means of recovering independent sources for example.

Benchmarking the unsupervised algorithm algorithm in a meaningful way is difficult since we have no way to compare $u^{(i)}$ to any ground truth. My idea to add another perspective to this problem is to train a supervised classifier on the dataset $\{u^{(i)}, y^{(i)}\}_{i = 1}^N$, i.e. just use the results of the unsupervised algorithm as features. If a classifier on this derived dataset performs well, it would provide some evidence that the results $u^{(i)}$ of the unsupervised algorithm is actually finding some meaningful structure in each $x^{(i)}$. If $u^{(i)}$ was basically just spurious, it wouldn't be possible to train this second classifier.

Does this sound like a reasonable means of comparison? Is anyone aware of existing work that benchmarks unsupervised algorithms in this way?

Putting the question more generally, by what means can we try to evaluate whether or not $u^{(i)}$ is providing useful or interpretable information? As opposed to just being some artifically constructed statistic, or something spurious.

RJTK
  • 111
  • 2
  • Generative Adversarial Networks (GANs) as described here could be seen as a special case of this (although the supervised classifier in a GAN does not only learn from $u$ and $y$ but $x$ too) – Jonathan Feb 29 '20 at 19:19
  • Not sure if there is enough context. But if you have $y$ why not use a supervised learning method in the first place ? – Lucas Morin Mar 01 '20 at 13:03
  • Because my goal is to lend evidence to the hypothesis that $u^{(i)}$ is actually telling us something about $x^{(i)}$, as opposed to being somewhat spurious or uninterpretable. Consider a case where our datasets of interest usually consist only of $x^{(i)}$, but in some cases we also get a $y^{(i)}$. If a classifier can predict $y^{(i)}$ from $u^{(i)}$ in these cases, then I think it suggests that $u^{(i)}$ contains scientifically relevant information about $x^{(i)}$, including in the datasets where $y^{(i)}$ is not available. – RJTK Mar 01 '20 at 14:03

0 Answers0