Why the sample size calculator over estimates the sample real needed? | Community
Skip to main content
miguelm62125791
Level 2
February 2, 2017
Solved

Why the sample size calculator over estimates the sample real needed?

  • February 2, 2017
  • 12 replies
  • 10633 views

Why the sample size calculator over estimates the sample needed? With fewer visitors than the estimated, the results are conclusive.

For example:

80% power, 95% Confidence level, baseline conversion of 10%, 2 offers, and 1000 daily visitors, the calculator says that you will need:

a sample size of 14,748 visitors to detect a lift of 10%. 

 

However, with 10,000 visitors per offer, you can detect a lift of 10% with a 98,89% Confidence Level.

 

Thank you in advance for your help. 

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.
Best answer by cki_phylo

Hi Rohit, I'm a product manager on Target and let me try to address your concern. First and most importantly, the sample size calculator does not provide an estimate. It stipulates the minimum sample size required in order to guarantee that your false-positive rate (ie inverse of Confidence) is bounded. Which means that if you desire a 95% confidence (or 5% false-positive rate), you MUST wait until this sample size has transpired in order to guarantee that only 1 out of 20 times (ie 5%) will a test yield a false-positive. Only after the test has crossed the sample size, a user should look at the Confidence-value and ascertain that it is indeed above 95%. If the confidence-value after the sample side has been acquired is below 95%, this means that at a 95% threshold for significance, your test in inconclusive. If all of this didnt make sense, here is a simple 3-step workflow to do AB-testing correctly:

 

1. Compute the sample size with desired significance (say 95%) and most accurate guesses for "Baseline CR", "Minimum detectable lift". If you have more than 2 experiences, dont forget to apply Bonneferroni correction.

2. Wait until each experience has acquired this sample size.

3. Evaluate only at this point, whether the Confidence value shown in the Reports is above 95%. If its not, your test is inconclusive and you do not have a winner for this test.

 

I understand this is something you may not have done before, but our years of analysis have shown that if users dont wait until the sample size, their tests are 56% likely to find a false-positive (ie a 'winner' that actually performs worse than control in reality). 

Hope that helps!

12 replies

Adobe Employee
February 23, 2017

Great feedback Miguel, and something definitely worth considering.  One benefit of the current approach is that it allows you the flexibility to accept your own range of false positives (either via using the calculator or not) via P Value.  The more structure in place in the UI around reporting, the less flexibility you would have.    But I definitely hear you! 

miguelm62125791
Level 2
March 22, 2017

Hi, 

I am still thinking about this. The following Sample Size Calculator seems to be using the same mathematical model to estimate the size as the one used to calculate the significance.

For instance, these values represent a significant test result:

Unique visitors expected per variation: 7500

Number of expected conversions: 750

Baseline CR: 10%

Expected Uplift: 10%

However the Adobe Sample Size calculator, estimates a needed sample of almost the double (14748 visitors per variation)

What is the mathematical model behind the Sample Size Calculator? Because it seems obvious that it is not the same used to calculate the significance level.