1

Let's say I am working on handwritten digit recognition (0 to 9). I know for instance that if I use clustering then I need to look for 10 clusters. But once I have the 10 clusters,how do I identify automatically which cluster corresponds to which digit? In general, say I have some other classification task where I don't have the labels for the training data, but I know that there are, lets say, two classes. I want to build a model which can then tell me, for the test instances, which classes they belong to. But again the problem that even if I can group the train cases together, how do I know the actual label of the test instance in this unsupervised setting?

1 Answers1

1

You can't, in a completely unsupervised setting. You need something more: e.g., a training data set where you have labels for at least some of the instances, or some other information that allows you to set a label for each cluster.

How is the computer supposed to know that a circle means zero as opposed to some other number, if you don't tell it that? Obviously, it can't. In fact, in the Eastern Arabic numbering system, there's a zero-looking character that actually means the number 5, not the number 0. Obviously, a computer can't possibly know which of those two meanings is intended, if you don't tell it somehow.

D.W.
  • 159,275
  • 20
  • 227
  • 470
  • 1
    To be precise, you don't need a training set to assign an object to a specific class. You need classification rules that can be obtained from a training set. – Marcin Król Dec 11 '15 at 09:13
  • Coud you please elaborate? – user3676846 Dec 11 '15 at 13:10
  • I had this classification task, where I was supposed to predict teachers who stayed at a particular school vs those that left the school for better opportunities.The problem was I only had data of the current teachers in the school,so I tried to ask some teachers to pretend as if they had the attributes of someone who would be likely to leave the school. Given such a setting, would it make sense to first cluster using say, k-means with k=2, then somehow determine which cluster denoted the teachers who left, and then assign labels based on these, and to further treat this as a supervised task? – user3676846 Dec 11 '15 at 13:15