Highest Voted 'statistics' Questions - Data Science Stack Exchange

11

votes

2 answers

Why use bootstrapping?

The wiki page for bootstrapping says that you use it in the case where the underlying distribution is unknown. Why is bootstrapping, or sampling with replacement, better than just calculating the variance and other properties from the data directly?

statistics

asked Feb 26 '16 at 06:04

sebastianspiegel

891
4
11
16

7

votes

2 answers

Data Science as a Social Scientist?

as I am very interested in programming and statistics, Data Science seems like a great career path to me - I like both fields and would like to combine them. Unfortunately, I have studied political science with a non-statistical sounding Master. I…

statistics

asked Jun 13 '14 at 07:28

Christian Sauer

517
3
6

6

votes

3 answers

Are Undergraduate Statistics Concepts Used in Practice?

I'm curious for more experienced Data Scientist, have you ever used t - test, ANOVA, Wilcoxon, etc? Basically my question is, do you perform inference task, or purely prediction tasks? (Machine Learning)

statistics

asked Feb 07 '20 at 07:18

datascientist102910291029

61
1

5

votes

2 answers

What is a logworth statistic, and how useful is it?

My teacher mentioned it today, and there is nearly zero good search results for it, other than one mention each in the SAS and JMP documentation. It says it is -log10(p-value), but there is almost no explanations of this online. Also it seems like…

statistics

asked Jan 29 '20 at 00:46

Gabriel Fair

257
3
8

5

votes

1 answer

Standardize numbers for ranking ratios

I'm trying to rank some percentages. I have numerators and denominators for each ratio. To give a concrete example, consider ratio as total graduates / total students in a school. But the issue is that total students vary over a long range…

statistics

asked Jun 25 '14 at 02:33

Rohit Mittal

53
2

5

votes

2 answers

Methods for standardizing / normalizing different rank scales

I know there is the normal subtract the mean and divide by the standard deviation for standardizing your data, but I'm interested to know if there are more appropriate methods for this kind of discrete data. Consider the following case. I have 5…

statistics

asked Oct 10 '14 at 00:59

Climbs_lika_Spyder

400
1
3
8

4

votes

1 answer

how to find probability of one or more events to happen from an incomplete data set

I have a dataset that gives information of a population. For instance, I know the fraction of people that are males (M) and that are within a certain age range (A), P(M & A), and then I know the fraction of males that live in a certain area (L), P(M…

statistics

asked Nov 16 '15 at 15:40

Brian

143
2

4

votes

4 answers

Statistics - Train and test data split

How much data should we use during training, and how much in testing? Can anyone explain why does it always seem to be 70:30 or 80:20 ratios?

statistics

asked Mar 03 '17 at 09:02

Shyama

91
1
2
8

2

votes

1 answer

Compare between similar and dissimilar couples of instances

I label couples of similar and dissimilar instances based on user behavior. each instance has a lot of features. I have few ways of labeling the couples. I know want to evaluate which of the label methods produce the most homogeneous distribution in…

statistics

asked Jan 12 '20 at 15:59

anat

155
4

2

votes

1 answer

How can I show the relations between travel destinations?

I'm trying to do a project about email marketing. I'm working on a tourism company and I want to make a best destination suggestion for the clients. But I need to see the relations between destinations. Example: How many people visited Dublin and…

statistics

asked Jun 25 '15 at 12:50

Uygar Yologlu

23
2

2

votes

2 answers

How should I create a single score with two values as input?

I have two series of values, a and b as inputs and I want to create a score, c, which reflects both of them equally. The distribution of a and b are below In both cases, the x-axis is just an index. How should I go about creating an equation c =…

statistics

asked May 03 '15 at 07:11

Eric Baldwin

123
3

2

votes

2 answers

Correlation between time to event data and continuous data

I want to measure the correlation between the survival time which is a time to event data and the patient's activity count which is measured on continuous scale. What type of correlation coefficient is available to measure the strength of these two…

statistics

asked Apr 26 '15 at 11:09

ASJRM

21
2

2

votes

1 answer

How to test the influence of a feature on conversion?

I have a user journey where I have data of the format: userID, did_interact_with_feature(0/1), did_convert(0/1) I want to verify the hypothesis that if a user is engaging with the feature, he's more likely to get converted. Now I can get the % of…

statistics

asked Nov 14 '18 at 04:58

Ronak Agrawal

206
3
11

2

votes

0 answers

How to track user given some guaranteed unique but deletable data and some possibly conflicting but non-deletable data?

I am trying to track users reliably on my website so that if they are abusive, they can be banned and not come back easily (obviously this can be bypassed with TOR and such, but most trolls don't care that much). I have some data that can be set…

statistics

asked Aug 25 '17 at 16:39

Robert Moore

121
2

2

votes

1 answer

Using Diebold-Mariano test

I've got predicted results from two different types of neural networks. Now I would like to run significance testing on both of the results to prove that they do not have equal predictive accuracy. I've learnt that the only tool in the game for this…

statistics

asked Apr 02 '16 at 13:00

JannaBotna

23
3

Questions tagged [statistics]