Under what scenario would you not split 50-50?



I would like to understand from other Target or optimization experts if you have worked on an A/B test where you have not split 50-50 traffic between the control and the challenger and if yes what's the reason behind it.

I understand that doing a 50-50 split reach the statistical significance faster thus reducing the testing period.

What if I do a 20-80 or 80-20 split? If I do that how to know how much traffic do I need per variant to reach statistical significance?







The majority of the time I do an equal split, but here are some cases when I do 90/10 or 80/20. There are a bunch of confidence calculators that will tell you how long something should be tested with an unequal split.

1. 90/10 -  Called a "Hold-out" Where I plan on rolling a new feature or creative out with full on a/b testing it. These would be things that have to happen due to technology upgrades or site standards pushes. They basically provide a baseline for how the new feature is performing against default. Determining if the new feature is hurting conversion.

Also do 90/10 splits when ramping a new product feature and making sure that the new feature can handle the server request loads. 

In both of these scenarios, usually don't wait for significance. They have directional data only.

2. 80/20 - Similar to a hold-out I've run 80/20 when running a price test or a test that could strongly affect the business if it lost. Starting out will displaying 20% to new pricing and package mixes and see how they perform against the default. This is just used as a safe-guard. If everything checks out, you could then ramp up to an even 50/50 and wait for significance.

I've seen these types of distributions for promotional banners and discount codes as well. Making the discount or promotion more exclusive since it is a smaller percentage that would see it. 

There are a lot of different reasons that could be applied depending on what you are trying to accomplish for the needs of the company.




In addition to Russ' response, another reason why we use 90/10 or 80/20 testing is to create baseline-audience while running Personalisation campaigns. In these scenarios, 90% or 80% of the traffic is shown personalised experience while rest are shown default experience. This way, it becomes easier for us to assess incremental business benefit of personalisation campaigns.

Hope this helps,





Can you give some links to the "bunch of confidence calculators" that will help calculate traffic needed for an uneven traffic split?