Have a simple question:
To conclude an a/b test, do we need to make sure all devices (at least desktop and mobile), both reach significance? Or just calculate the overall?
I would say to keep the test running as this will help further prevent any false positives.
Many thanks mravlich. At the moment, we announce a winner once the overall minimum sample size has reached. I'm just wondering if we should keep one test running until both desktop and mobile reach significance rather than doing post-test segmentation.
With Post-Test segmentation, each segment (in your example, mobile is one segment and desktop is another segment) represents a separate test. If there is no difference in the conversion rate, each time you test a segment, the probability of a false positive equals the significance level. Each time that you do post-test segmentation, the likelihood of a false positive will increase but this should not stop you and your team from doing post-test segmentation as it is very valuable. There are two ways around this:
1) Run a new test with only the identified post-test segment (ex: mobile). This will help verify the results from the previous test.
2) Apply the Bonferroni correction. Divide the significance level by the number of comparisons to come up with the significance level you need to achieve a 95% confidence level.
See the example of Bonferroni correction on the following page: Nine Common A/B Testing Pitfalls and How to Avoid Them