Hi,
We have an Adobe Experience Platform report suit that is currently receiving server calls from know bot user agents. Bot filtering prevents these coming into our front end reporting.
Following Adobe documentation, we have updated JS code to exclude user agents containing 'Googlebot' using the onBeforeEventSend callback, however we are continuing to receive server calls.
Please could you review the code and advise if there are any amendments or additions to be made?
Thanks
Neil
Solved! Go to Solution.
Topics help categorize Community content and increase your ability to discover relevant content.
Views
Replies
Total Likes
Yes, "WebSDK" is not to be confused with using Adobe Launch (err... Adobe Data Collection).
AppMeasurement.js is the client side, "old school" way of tracking, Alloy.js / WebSDK is the server side tracking that relies on the XDM Stream. Both solutions can be controlled via Adobe's tag manager (Launch), or in GTM, or even deployed directly into the site code.
However, now that I know you are not using Launch, the "conditions" that I mentioned wouldn't work, but you should be able to do something similar with GTM (adding constraints onto the Page Views trigger), if you really want to stop tracking calls made via bots... the principal is essentially the same, add conditions to your rules rather than trying to abort the call at the end...
I would still think that the onBeforeEventSend would be in both page view and action calls though... I wonder if you could add some sort of traceability (like adding console.logs inside the onBeforeEventSend function (to catch when that triggers), then again inside the navigator logic to catch when "Googlebot" is identified...
Something like:
onBeforeEventSend: function() {
console.log("!!!!!!!!!!!!!! onBeforeEventSend triggered");
console.log("!!!!!!!!!!!!!! " + navigator.userAgent);
if (navigator && navigator.userAgent && navigator.userAgent.includes("Googlebot")) {
console.log("!!!!!!!!!!!!!! Googlebot identified");
return false;
}
}
You can pretend to be Googlebot through the use of some "User Agent Switcher" add-ons for your browser... you can then test this and see what is happening.
If this looks like it might be working locally, then you might want to consider setting up some new dimensions to track the userAgent and other information that might help you track down what calls are not being caught by this logic.
Good luck!
Views
Replies
Total Likes
Hmmm the code to identify Googlebot looks solid, I am not sure about the onBeforeEventSend function though... I am not using WebSDK yet, so I can't check this...
But, as an alternate suggestion, and possibly one that might also make your site more efficient.. have you considered instead of trying to abort the call before sending (meaning all your rule logic is still running), what about using a condition on your rules... if the user agent is Googlebot (or other identified bot), don't even run the rule, which then in turn shouldn't trigger the tracking call at all?
if (navigator && navigator.userAgent && navigator.userAgent.includes("Googlebot")){
return true;
}
Basically, if the user agent contains Googlebot, return true, and since we are using an "Exception" rule, this should prevent the rule from running completely.
You would have to apply this to all rules, but it also means that if it's Googlebot, you aren't running all the JS to build out your tracking request....
Views
Replies
Total Likes
@Neil_Thorpe my first question would be: are you using Adobe Analytics or Customer Journey Analytics?
Web SDK
Web SDK's onBeforeEventSend callback can be used to make sure calls are dropped if the condition returns false.
However, I could imagine that having this approach could over time grow significantly.
Datastreams offer bot detection, but they will only flag the traffic and not block it.
Adobe Analytics
Setting up bot rules in Adobe Analyics, or at least enabling the IAB bot detection rule and tweaking it as needed should automatically detect and drop Google crawlers.
The IAB bot list is not typically solving the issue. What I typically do is create a segment for crawlers that were not detectrd by the IAB list.
Mostly, this includes old browser versions, Linux as operating system, or whatever makes sense.
This segment I can then either apply to - if exist - virtual report suite filters, or make sure to apply it on Workspace Panel level.
A little cumbersome, but unfortunately that is the reality, when everyone can program their own crawler that uses the latest Chromium version and scrapes your website.
Customer Journey Analytics
In combination with the datastream bot detection, you will still have to filter the traffic out, as Jim Gordon is explaining in this video.
Views
Replies
Total Likes
Can I ask however why you are trying to stop tracking bots completely? Instead of using Bot filtering where you can still get basic information about what bots are hitting your site and what pages, and when they are crawling your sites?
I often find this to be important information to delve into..
I do understand that it will cost server calls, but if there is an issue on your site, it can take your operations team a while to get a parse through the server logs to see the bot activity...
Using your bot filters on your reporting suite allows this data to not flow into your reports, while still providing visibility into what is happening on your site in regards to bots....
Views
Replies
Total Likes
@Jennifer_Dungan @bjoern__koth
Thanks so much for your help and advice.
To provide some context, we are not using the Adobe WebSDK - we write JS code and inject this into Google Tag Manager in order to manage Adobe Analytics tracking.
Also, the report suite that is generating the bot server calls is not our primary one. We have an SSL report suite that recognises bot and scraper user agents so that we can keep on top of it.
I'm not sure whether this makes a difference or not, but the event that's being tracking is not a page load, rather a custom link that's firing on the page
Based on your advice, I will work with the developers here to look at aborting the call before sending.
Views
Replies
Total Likes
Hi @Neil_Thorpe
your screenshot basically confirms that you are using Web SDK xD
Alloy.js / "alloy" is the new library that is used to send calls into the Experience Platform.
onBeforeEventSend is one of its callbacks.
The bot blocking approach should be used in all contexts, independen from whether it's a page view or link tracking call you are sending.
Though many times bot only scrape the page upon load and do not interact with it, so link tracking calls might be less affected,
Views
Replies
Total Likes
Yes, "WebSDK" is not to be confused with using Adobe Launch (err... Adobe Data Collection).
AppMeasurement.js is the client side, "old school" way of tracking, Alloy.js / WebSDK is the server side tracking that relies on the XDM Stream. Both solutions can be controlled via Adobe's tag manager (Launch), or in GTM, or even deployed directly into the site code.
However, now that I know you are not using Launch, the "conditions" that I mentioned wouldn't work, but you should be able to do something similar with GTM (adding constraints onto the Page Views trigger), if you really want to stop tracking calls made via bots... the principal is essentially the same, add conditions to your rules rather than trying to abort the call at the end...
I would still think that the onBeforeEventSend would be in both page view and action calls though... I wonder if you could add some sort of traceability (like adding console.logs inside the onBeforeEventSend function (to catch when that triggers), then again inside the navigator logic to catch when "Googlebot" is identified...
Something like:
onBeforeEventSend: function() {
console.log("!!!!!!!!!!!!!! onBeforeEventSend triggered");
console.log("!!!!!!!!!!!!!! " + navigator.userAgent);
if (navigator && navigator.userAgent && navigator.userAgent.includes("Googlebot")) {
console.log("!!!!!!!!!!!!!! Googlebot identified");
return false;
}
}
You can pretend to be Googlebot through the use of some "User Agent Switcher" add-ons for your browser... you can then test this and see what is happening.
If this looks like it might be working locally, then you might want to consider setting up some new dimensions to track the userAgent and other information that might help you track down what calls are not being caught by this logic.
Good luck!
Views
Replies
Total Likes
@Jennifer_Dungan @bjoern__koth Thanks both for your continued support.
We have now implemented a more all-encompassing bot blocker in the alloy script on the initial page load which will hopefully prevent subsequent scripts from running. I can track this by emulating the Google Bot user agent and it seems to be working correctly. Hopefully the server calls will no longer come through!
Glad to hear the improvements seem to be working.. I will keep my fingers crossed that this works in production.