Hello,
I am running an A/A test and started including a second control in all my tests so I can do an A/A comparison (10% of traffic to second control) in every test. The issue is that the two controls perform differently a lot of the time, even with a even 50/50 split.
Questions:
- At what point do you consider two controls to be performing similarly enough to trust the baseline? Is it =-3% conversion rate for the KPI? It is hard to do anything higher if your power analysis has a minimum detectable effect of 5%.
- How big of a sample is needed for Target to determine two controls have the same performance? What can impact that time? Wondering if I do a typical pre-test duration calculation/power analysis and run the tests longer if the controls are not performing the same.
The biggest issue with the A/A test results is that it puts in question the previous tests we have run as we have no idea if the control performance was actually a true representation of the baseline.
Thank you for any help!