Which test should I use for hypothesis testing with a small sample size?

Question

I've run a test with one control and one experiment group, and am questioning myself on whether or not I've used the right test (or if significance can even be calculated on the following sample sizes).

The data is as follows:

The control cohort (A) had 63 people see the treatment and 1 person performed the action (1.59%)

The Experiment cohort (B) had 64 people see the treatment and 9 people performed the action (14.1%)

I used a z-test for two population proportions (this equation: http://www.socscistatistics.com/tests/ztest/) to compare the two proportions. It says that the number of people who performed the action in B is a statistically significant increase over the number of people who performed the action in A with a p-value of 0.00453.

However I wanted to make sure that:

a) I'm using the right test -- I know t-tests are sometimes better tests to use when samples sizes are small
b) Statistical significance can even be determined on such a small sample

Any help or guidance would be great - thanks!

I am not sure why people place a down vote. I guess it is more appropriate for you to ask this in Cross Validation, a stack exchange site for statistics. — Brian Ding, Aug 20 '15 at 03:44
Thanks @BrianDing - I'll also post this question on that site. I wasn't aware of it until now. — Josh, Aug 20 '15 at 03:49
Asked at http://stats.stackexchange.com/q/167978/10259, as requested. — JRN, Aug 20 '15 at 03:54
I believe this is an appropriate question for the this (math) site. OP is fundamentally asking about model assumptions. — BruceET, Aug 20 '15 at 04:03

BruceET · Accepted Answer · 2015-08-20T19:21:23.647

1

The test you referenced is the standard z test for the difference of two binomial population proportions. It assumes that the z-statistic is approximately normal. I got $|Z| \approx 5.83$ which is significant at any reasonable level of significance (including 1%, as you mention). I do not see the applicability of the t distribution to this situation.

However, because the counts of 'successes' are so small, I wondered about the validity of the normality assumption. So I did a Fisher exact test in R, obtaining a P-value between 1% and 2%, so it certainly seems legitimate to reject the null hypothesis (that the two groups react similarly) at the 5% level.

Addendum: More specifically, one often sees the following rule-of-thumb to determine whether the test statistic in a z-test is sufficiently near normal to give reliable P-values. The minimum of these four numbers should be at least 5 (less-fussy authors say 4): the two success counts and the two failure counts. (In your case you'd have $\min(1,9,62,55) = 1 < 3,$ and so you shouldn't rely on the z-test.) However, in your case it isn't anywhere near a close call whether to reject at the 5% level.

edited Aug 20 '15 at 19:21

answered Aug 20 '15 at 03:56

BruceET

51,500

Thanks Bruce, this helps! If I have other tests that are similar to this one in the future (I expect to), would you recommend using the Fisher exact test over a z-test that assumes the z-statistic is approximately normal? – Josh Aug 20 '15 at 04:19
For large enough sample sizes (> 50 or so) with proportions of successes say roughly between 20 and 80 percent, the normal test should be fine. If in doubt, just do both and see if results are reasonably coherent. If still in doubt, post another question with the particulars of the data and your concerns. – BruceET Aug 20 '15 at 04:40
Makes sense. Do both samples have to have successes in the 20-80% range, or is it ok if it's just one of the samples? (Reason I ask is because I suspect a lot of the control cohorts will have successes of far less than 20%) – Josh Aug 20 '15 at 04:48
1

I have included an Addendum in my Answer that provides advice that is somewhat more precise. (My last comment envisioned sample sizes in the 60-70 range.) – BruceET Aug 20 '15 at 19:23
1

I'm beginning to wonder whether we have explored the $relevant$ issues sufficiently. It seems you are running a sequence of tests in which you expect success probability in Group 2 to exceed that of Group 1. Then (a) you should be running one-sided tests and (b) you should be concerned about the $power$ of your tests, that is the probability of detecting a real difference. If you want to start a new question to explore z vs. exact tests, and sample size to achieve reasonable power, then I (or someone else) might have useful advice to offer. Include a little more context. – BruceET Aug 20 '15 at 20:28
Thanks - posted: http://math.stackexchange.com/questions/1408428/i-am-running-a-series-of-experiments-that-i-expect-to-have-similar-outcomes-wha
Let me know if you need any more detailed info.
– Josh Aug 24 '15 at 21:03

Which test should I use for hypothesis testing with a small sample size?

1 Answers1

Linked