Expand my Community achievements bar.

Join us for the next Community Q&A Coffee Break on Tuesday April 23, 2024 with Eric Matisoff, Principal Evangelist, Analytics & Data Science, who will join us to discuss all the big news and announcements from Summit 2024!
SOLVED

Getting Daily Visitors for A/B Test Sample Size Estimation

Avatar

Community Advisor and Adobe Champion

I'd like to get peoples opinions on what methods you use to estimate your sample size for A/B tests. In the table below I've got a column for unique visitors, and then two methods of getting visitors per day.

The first is using the "mean" function and it works by essentially summing up all of the individual rows and then dividing by the number of rows. The issue with this is visitors can get double counted (if they visit more than once in the time period or if their visit spans multiple days).

The second is dividing unique visitors by a count distinct of days and it works by diving the deduplicated total by the number of days. The issue with this is that it can give a daily visitor number that is lower than any single day total (because it's likely you'll have customers visiting more than once, especially for longer time periods).

 

Mandy_George314_0-1683041693547.png

 

So my question is - which of these methods do you use to estimate your daily sample size for A/B tests? Or do you use a different method?

 

1 Accepted Solution

Avatar

Correct answer by
Community Advisor

Hmmm, this really depends on your definition of "Daily Average"... Here is my 2 cents worth:

If you are looking at a daily granularity, and trying to get an average, then the duplication on your mean should be fine... if you are actually trying to see average UVs per day...  Now, if you are trying to get Monthly Unique Visitor granularity, but averaged per day, then yes, the duplication is going to be a problem....

 

If you are going to use the Distinct Count method, I would expect this to only be applied at a Monthly Granularity (not daily), since you are looking at your Monthly UVs... not Daily UVs. Then maybe show this as a Summary visualization for the monthly report (as opposed to in a one row table) or if I am doing a 3 month or 6 month trend, to show a table with monthly breakdown.

 

If you use this method on a Daily Breakdown, you aren't really getting an average, you are essentially dividing your Day's UVs by 1 (as the Day count for April 1 is actually "1"). The only reason the data is different is due to some oddities with how the Approximate Distinct Count is being calculated (I think Approximate Distinct Count Days of Month seems a little more reliable, but still has some issues)

 

 

Both options are good to get different interpretations of "Average Daily", so you just need to be clear about what your definition is.... 

View solution in original post

1 Reply

Avatar

Correct answer by
Community Advisor

Hmmm, this really depends on your definition of "Daily Average"... Here is my 2 cents worth:

If you are looking at a daily granularity, and trying to get an average, then the duplication on your mean should be fine... if you are actually trying to see average UVs per day...  Now, if you are trying to get Monthly Unique Visitor granularity, but averaged per day, then yes, the duplication is going to be a problem....

 

If you are going to use the Distinct Count method, I would expect this to only be applied at a Monthly Granularity (not daily), since you are looking at your Monthly UVs... not Daily UVs. Then maybe show this as a Summary visualization for the monthly report (as opposed to in a one row table) or if I am doing a 3 month or 6 month trend, to show a table with monthly breakdown.

 

If you use this method on a Daily Breakdown, you aren't really getting an average, you are essentially dividing your Day's UVs by 1 (as the Day count for April 1 is actually "1"). The only reason the data is different is due to some oddities with how the Approximate Distinct Count is being calculated (I think Approximate Distinct Count Days of Month seems a little more reliable, but still has some issues)

 

 

Both options are good to get different interpretations of "Average Daily", so you just need to be clear about what your definition is....