2

The independent variables in the dataset contains categorical variables such as

  • Gender ( 2 levels)
  • Mode of Shipment ( 3 levels)
  • Product Importance ( 4 levels)

and Numerical Variables such as

  • Customer care calls
  • Discount Offered
  • Package weight

How do I find the correlations between these variables?

  1. Converting categorical variables in to dummy variables and then using pearson correlation? What if the dummy variable categories also shows correlations too? such as correlation between Mode of shipment categories, Flight, ship, road? Do I need to remove the highly correlated dummy variable category with the other mode of shipment category? or
  2. doing separate correlations between numerical variables using pearson correlation, and for categorical variables using chi sq statistics?

How to go about it?

Thank you! It's a long question, but really need this clarity. Would appreciate any additional links too. Thanks again!

  • Welcome to the site. Here are two links to searches that give the answer - https://datascience.stackexchange.com/search?q=correlation+categorical and https://stats.stackexchange.com/search?q=correlation+categorical. Specifically here is a good answer - Anova - https://datascience.stackexchange.com/questions/893/how-to-get-correlation-between-two-categorical-variable-and-a-categorical-variab?rq=1 Regarding removal of the dummy variable, the dummy variable is required mathematically for the dummy variable trap - used in linear/logistic regression and similar - when there is an intercept. – Craig Apr 18 '21 at 19:18

0 Answers0