6

I'm curious for more experienced Data Scientist, have you ever used t - test, ANOVA, Wilcoxon, etc?

Basically my question is, do you perform inference task, or purely prediction tasks? (Machine Learning)

  • 1
    Well t-test and ANOVA test are widely used when one wants to apply feature selection and/or analyze the feature importance. But, could you please elaborate a bit further? – Giannis Krilis Feb 07 '20 at 09:56

3 Answers3

4

t-test and ANOVA are used pretty often, more in statistical data analysis which is a "must know" for a data scientist but not necessarily their everyday work. More you go toward medical/bio statistics or social sciences, you see them more used.

In everyday life of a data scientist, the problem of feature selection, for example, is one of situations where ANOVA helps. For instance imagine numerical features and discrete classes in a classification problem. A way to select good features is to compare the distributions of features values for each class and see if those distributions differ significantly.

I also have used Wilcoxon test once in a low-sample size project where I was trying to recognize patterns in Complex Networks, created from time-series of two control groups. The story was to see which features of those networks (centrality measures, clustering coefficient, average path length, etc.) makes the significant difference (so can be used for recognition i.e. classification problem).

This difference was the difference between distribution of feature values among those two groups where those distributions were not necessarily Gaussian. Wilcoxon helped a lot there, as some bold difference between two groups for certain features were actually statistically not significant and without such test the analysis could go the wrong way.

Hope it helped!

Kasra Manshaei
  • 6,570
  • 1
  • 21
  • 45
3

Things I still use in everyday life:

  • t-Test
  • Imputation techniques for dealing with missing data (such a pain!)
  • ACF, PACF plots for times series data
  • Standardization techniques (e.g. Z-scores)
  • Regression diagnostics (less often)
  • related to the one above: Shapiro-Wilk test for Normality of a distribution

I'll keep editing this answer, adding things as they come up.

Leevo
  • 6,225
  • 3
  • 16
  • 52
0

As an Entry Level Analyst, I use these concepts in my day to day work

  • T-Test
  • ANOVA
  • Chi-Square
Rohan Shetty
  • 191
  • 1
  • 5