1

I have a dataset with only 2 categorical attributes out of 9. How can I get a clustering analysis on it? I am using R. Do you have any advices about instructions, how to do it, topics, ...? here's my dataset

thanks

user96624
  • 169
  • 7

2 Answers2

2

Here is nice implementation of mixed type data in R-

https://dpmartin42.github.io/posts/r/cluster-mixed-types

This question right here- K-Means clustering for mixed numeric and categorical data

and a Discussion Thread of Kaggle-

https://www.kaggle.com/general/19741

There are ways, to either map your categorical data to numeric type and then you can go about the business as usual, or choose similarity measures which works for categorical data type, in this case you have options to choose from counting frequencies etc .

BlackCurrant
  • 688
  • 1
  • 5
  • 15
1

You will need some way of converting categorical data to numerical, or numerical to categorical. One way to do this (convert categorical to numerical) is with one-hot encoding, where you look at the number of categories you have and make a vector of that size. Then, you can map each datapoint to a vector with 0 everywhere except for the location for the designated category, which will be 1.

A better, more detailed explanation of one hot encoding here: https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/

chenjesu
  • 221
  • 2
  • 3