Expand my Community achievements bar.

Join us for the next Community Q&A Coffee Break on Tuesday April 23, 2024 with Eric Matisoff, Principal Evangelist, Analytics & Data Science, who will join us to discuss all the big news and announcements from Summit 2024!

Segmentation: Bot Traffic Identification & Exclusion Tool

Avatar

Level 9

1/9/17

I need help identifying bot traffic because we get a ton of it; somewhere between 30 - 40% of our page views and 5-15% of visits.  I am not referring to known bots (Google Spider) or malicious bots attempting to take down our site or defraud us.  I am referring to third party scrapers coming to us for information.  This type of trafffic is not looked at negatviely because 1) it is not harmful to our site experience. 2) everyone does it. 3) it is difficult to police.

Because we get so much bot traffic, we spend a chunk of time identifying if swings in our KPIs are real or due to non-human traffic.  This slows us down considerably. The bots coming to our site use standard devices, user agent strings, operating systems, devices, and also change their IP addresses frequently.  I am able to qualitatively identify this traffic because of the following:

1. This traffic is typed/bookmarked.

2. This traffic never has any of our campaign parameters.

3. This traffic lands on pages that would not normally be a direct landing page (i.e. a specific product page)

4. This traffic is from the 'Other' device type.

5. Page Views = 1 per visit.

6. Visits = Visitors and visits is showing very high numbers, i.e > 1k when looking at captured IP addresses.

So, whoever is crawling our site is deleting their cookies on the same IP address and viewing a single page view.   See attached for a screenshot.

It would be great to somehow aggregate visits from different visiors (cookies) where certain behaviors are taking place.  For example: 

Exclude all 'Visitors' if

1. 'Any value' for a given variable (evar/prop) shows up more than X times.

AND

2. PVs per Visit for each visit <= 1

AND

3. Traffic Source for all visits is typed/bookmarked.

We can solve for this in SQL , but not sure its doable in Adobe.  Any thoughts?

12 Comments