We had some a/b/c/d tests that concluded pretty quickly (normally within a week) with a winner and high confidence level. However, after leaving the test running for another 2 weeks, the confidence level went down and tests concluded with a completely different winner with 95%+ confidence. Has anyone had the same experience and how do you deal with this situation? We have seen the worst performing variation when it's first concluded became the winner when it concluded the second time...

Here is some pertinent information from this KB article: Confidence Level and Confidence Interval

If the confidence level is over 90% or 95%, then the result can be considered statistically significant. Before making any business decisions, try to wait until your sample size is large enough and that the 4 bars of confidence on one or more experiences stays consistent for a continuous length of time to ensure the results are stable.

The following list shows the meaning of the number of confidence bars:

One bar: significance < 60%

Two bars: significance < 75%

Three bars: significance < 90%

Four bars: significance >= 90%

Hope this has been helpful to you!

