Number of Days using Approximate Count Distinct

iamjasona

11-08-2020

In the Average Daily metrics I've seen (e.g., Average Daily Visits, Average Daily Revenue), there is a technique that utilizes the Approximate Count Distinct function to calculate the number of days over a time dimension of your choosing.  It works pretty well...except, every once in a while, it comes back missing a day.

 

Below is an example using this function broken down by week.  Across all reporting suites we support, the week of July 12, 2020 shows about 6 days instead of 7.  Same thing happened last year during the week of Sept 22, 2019.  Due to this, these weeks showed large spikes in Average Daily Visits, Revenue, etc., but it's really just due to an incorrect # of days in the denominator.

 

I recognize the function is "approximate", but the value is very consistent across other weeks of the year; I gather the "approximate" part of the metric deals more with larger volumes of data.  Any idea what's happening here?

 

iamjasona_0-1597173918269.png

 

Accepted Solutions (0)

Answers (1)

Answers (1)

mesood

12-08-2020

Since Approximate Count Distinct (dimension) returns the approximated distinct count of dimension items for the selected dimension. The function uses the HyperLogLog (HLL) method of approximating distinct counts.  It will return data point which have ± 5% variation. Hence you may see a variation in number in regards with total unique item showing for a dimension. 

 

However, it would be good to raise a ticket with Client Care for getting clarity on this issue.