Highlighted

ClickStream Duplicate Visits

craigtarr

01-03-2018

Has anyone encountered a scenario where the visit shows a slightly different visit_start_time_gmt, only seconds apart in their feed?

We have a fully combined post_visid_high, post_visid_low, visit_num AND visit_start_time_gmt. A specific case is a visitor who had 7 page views in the visit. the visit_start_time_gmt actually has a different value for page view 5. So it makes it look like they initiated a new visit in the data feed.

Page 49 of the analytics clickstream document states,

*In rare circumstances, internet irregularities, system irregularities, or the use of custom visitor IDs can result in duplicate  visit_num values for the same visitor ID that are not the same  visit. To avoid resulting issues, also include  visit_start_time_gmt when counting visits.

Not sure I can do this when the gmt value has slightly different values.

Replies

Highlighted

mattm69689282

Employee

09-03-2018

Hi Craig,

I may be missing some of the details of your challenge, but thought below may be of assistance.

I lead a team of engineer consultants at Adobe that custom process the clickstream data feeds for other Adobe customers.

Typically, we use visid_high & visid_low (post) and bring in visit_num (as a unique key) and add & sort by the hit_time_gmt. This timestamp is set at collection on every web beacon. Reviewing this specific timestamp along with the visit number that is processed during cache based on the virtual cookie table on the Adobe side, should provide what you're looking for. At least, it should give you another reference to view the data another way.

Also, I have a few basic data feed query examples here:

Using Adobe Analytics Data Feeds and SQL for Basic Reporting – The Data Feed Toolbox

And I'll be presenting at Summit in a technical lab if interested:

Become an Adobe Analytics Data Feed Ninja

This may not fully answer your question, but hope this is helpful.

Best,

Matt

Highlighted

craigtarr

09-03-2018

Matt thanks for the courage in responding. I have passed the info over to my data analysts.

In our unique situation the hit_time_gmt seems to be out of sequence for at least 1 visitor, possibly more. And that sequence issue is causing the visit_num to appear like this is a new visit, instead of a continuation.

I've scoured the Internet looking for someone else who has talked about this but haven't seen any duplication. And like the documentation says, in rate instances, well we hit that needle in a hay stack!

Highlighted

Gigazelle

Employee

12-03-2018

Do you know if the report suite you're reporting on has timestamped hits enabled? Also, are we looking at one-off issues that could potentially be chalked up to hash collisions, or are we talking about systemic issues that would affect business decisions you'd make with the data?

Highlighted

craigtarr

21-03-2018

No, i confirmed that we do not have timestamped hits enabled.

From the limited data set we are looking at (only one report suite out of 12) and only a subset of the year it could be business impacting. My hope is that it's a one time hiccup. Will continue to dig deeper to see if it continues.

In what way would timestamped hits make a difference? We didn't really want to use hit_time_gmt but documentation seems to be pointing us in that direction.

Highlighted

clayowensby

27-03-2018

Hi Matt,

I work with Craig.  We loaded two sets of data and have found mixed results when following the Adobe documentation and queries in the above links.  When looking for page views we followed the ClickStream documentation and found that the below query achieves the numbers found on the key metrics report within Adobe Analytics.

select count(*)

from dataTable

where POSTPAGEEVENT = 0

and EXCLUDEHIT = 0

and HITSOURCE = 1

Can you speak to the difference from the below query which produces a different result from the key metrics report mentioned above.

select count(*)

from dataTable

where ((pagename != '') or (pageurl != ''))

and EXCLUDEHIT = 0

and HITSOURCE = 1

When calculating visits, the documentation wants to group by POSTVISIDHIGH, POSTVISIDLOW, VISITNUM and VisitStartTimeGMT.  This grouping gives me a different result from the key metrics report.  When I use the below query, the result match.

select count(*) from (

select count(*) c

from [FINS_WM_STG].[dbo].[STG_FINS_FACTWMSESSION] stg

where EXCLUDEHIT = 0

and HITSOURCE = 1

Group by POSTVISIDHIGH, POSTVISIDLOW, VISITNUM

) a

Again, I ran the above queries on two sets of data and the results were the same.  Can you speak to these differences?