Am I correct in finding correlations

Asked Jun 14 '17 at 14:38

Active Jun 14 '17 at 14:38

Viewed 44 times

I want to perform feature selection, having 128 real-valued standardized features and 1/0 labels. Below are feature a5 density histograms for Classes 1 and 0. The data is skewed, so that Class 1 is about 5% weight.

Next I subtract right curve from left, so that to remove "normal" (relative to this feature) distribution. I assume that if 50% of feature values lay in e.g. [0.1-0.2] in BOTH classes this means this range is nothing special, regarding my classes. That's why I make subtraction. This hopefully gives me ranges (positive values of resulting curve) where this feature contribute to choosing Class0. Is this assumption correct?

Then I build graphs this way for all features. Below are two different features. I suppose that leftmost is better than rightmost, giving clearer distinction in correlating to one class or another (right is noisier) and the rightmost could be removed, if need to reduce feature number. Is this correct?

asked Jun 14 '17 at 14:38

noname7619

In the last figure, is the scale the same in both graphs? – noe Jun 14 '17 at 14:52
Apart from that, why don't you compute the numeric values of the correlation coefficient $\rho$? – noe Jun 14 '17 at 18:20

Am I correct in finding correlations

0 Answers0