Number of Days using Approximate Count Distinct | Community
Skip to main content
iamjasona
Level 2
August 11, 2020
Solved

Number of Days using Approximate Count Distinct

  • August 11, 2020
  • 1 reply
  • 1527 views

In the Average Daily metrics I've seen (e.g., Average Daily Visits, Average Daily Revenue), there is a technique that utilizes the Approximate Count Distinct function to calculate the number of days over a time dimension of your choosing.  It works pretty well...except, every once in a while, it comes back missing a day.

 

Below is an example using this function broken down by week.  Across all reporting suites we support, the week of July 12, 2020 shows about 6 days instead of 7.  Same thing happened last year during the week of Sept 22, 2019.  Due to this, these weeks showed large spikes in Average Daily Visits, Revenue, etc., but it's really just due to an incorrect # of days in the denominator.

 

I recognize the function is "approximate", but the value is very consistent across other weeks of the year; I gather the "approximate" part of the metric deals more with larger volumes of data.  Any idea what's happening here?

 

 

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.
Best answer by Megha2390

Since Approximate Count Distinct (dimension) returns the approximated distinct count of dimension items for the selected dimension. The function uses the HyperLogLog (HLL) method of approximating distinct counts.  It will return data point which have ± 5% variation. Hence you may see a variation in number in regards with total unique item showing for a dimension. 

 

However, it would be good to raise a ticket with Client Care for getting clarity on this issue.

1 reply

Megha2390Accepted solution
Level 4
August 12, 2020

Since Approximate Count Distinct (dimension) returns the approximated distinct count of dimension items for the selected dimension. The function uses the HyperLogLog (HLL) method of approximating distinct counts.  It will return data point which have ± 5% variation. Hence you may see a variation in number in regards with total unique item showing for a dimension. 

 

However, it would be good to raise a ticket with Client Care for getting clarity on this issue.