Expand my Community achievements bar.

Join us for the next Community Q&A Coffee Break on Tuesday April 23, 2024 with Eric Matisoff, Principal Evangelist, Analytics & Data Science, who will join us to discuss all the big news and announcements from Summit 2024!
SOLVED

Number of Days using Approximate Count Distinct

Avatar

Level 3

In the Average Daily metrics I've seen (e.g., Average Daily Visits, Average Daily Revenue), there is a technique that utilizes the Approximate Count Distinct function to calculate the number of days over a time dimension of your choosing.  It works pretty well...except, every once in a while, it comes back missing a day.

 

Below is an example using this function broken down by week.  Across all reporting suites we support, the week of July 12, 2020 shows about 6 days instead of 7.  Same thing happened last year during the week of Sept 22, 2019.  Due to this, these weeks showed large spikes in Average Daily Visits, Revenue, etc., but it's really just due to an incorrect # of days in the denominator.

 

I recognize the function is "approximate", but the value is very consistent across other weeks of the year; I gather the "approximate" part of the metric deals more with larger volumes of data.  Any idea what's happening here?

 

iamjasona_0-1597173918269.png

 

1 Accepted Solution

Avatar

Correct answer by
Level 4

Since Approximate Count Distinct (dimension) returns the approximated distinct count of dimension items for the selected dimension. The function uses the HyperLogLog (HLL) method of approximating distinct counts.  It will return data point which have ± 5% variation. Hence you may see a variation in number in regards with total unique item showing for a dimension. 

 

However, it would be good to raise a ticket with Client Care for getting clarity on this issue.

View solution in original post

1 Reply

Avatar

Correct answer by
Level 4

Since Approximate Count Distinct (dimension) returns the approximated distinct count of dimension items for the selected dimension. The function uses the HyperLogLog (HLL) method of approximating distinct counts.  It will return data point which have ± 5% variation. Hence you may see a variation in number in regards with total unique item showing for a dimension. 

 

However, it would be good to raise a ticket with Client Care for getting clarity on this issue.