Marketo Activity Ingestion - Understanding Behaviour of bulk extract | Community
Skip to main content
DJ_Erraballi
Level 2
June 17, 2020
Solved

Marketo Activity Ingestion - Understanding Behaviour of bulk extract

  • June 17, 2020
  • 3 replies
  • 6179 views

Hi there, 


got a multiparter question: 

Question #1 

 

I am currently debugging some issues with our Marketo activity feeds. Noticed recently that we obtained an activity from marketo that looked like this: 

 

result = {OrderedDict: 8}  
'marketoGUID' = {str} '149877905'
'leadId' = {str} '3173901'
'activityDate' = {str} '2020-06-05T23:08:07Z'
'activityTypeId' = {str} '1'
'campaignId' = {NoneType} None
'primaryAttributeValueId' = {str} '31707'
'primaryAttributeValue' = {str} 'www.multicare.org/photos/'
'attributes' = {NoneType} None 

 This landing page activity didn't have the attributes set entirely. (This is where i am usually obtaining the web page url, referral url if present, etc.). Is this expected to occur? Been ingesting landing page activities since 2019 for multiple clients and this is the first time we have come across the above so wondering if it somethign that is likely to occur again.

Currently: not only do we depend on attributes to be set, we also depend on 'Webpage URL' to be set on the attributes field in order to properly report on data in marketo. 


 

Question #2

 

It appears that we are actually missing some data in our extract from marketo, and it is possible that we may need to tweak our strategy.

 

Currently we execute a bulk extract with a start date a couple minutes before our most recent known stored activity. If we look at each extract as a slice, currently we are guaranteed to include every single time slice that is possible in our extracts. 

 

Where i am worried, if i query something like (dummy values): 


startAt: 3:00pm 

endAt: 4:00pm

at 4:01pm. i would get a set of activities that is DIFFERENT

than if made the same query at 4:16pm. (This is my current working theory for why it appears there are activities missing). 

Is it possible that marketo can add out of order activities (with activity dates in a past time range)? If so is there a recommended buffer to add to our start_at time period, to ensure we don't miss any activities? Also how long after a time range has elapsed could activities be added to that time range? 

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.
Best answer by SanfordWhiteman

Yep understood, seems like there isn't gonna be an easy way to recapture missed ones regularly without hitting our export quota. One option is to query historic 28 day ranges on a once per day basis, but i'm too scared to hit export quotas for the clients.

 

Do a best effort, and hope those late-add activities weren't form submissions :/. 


Yeah, it's almost impossible to solve this. You do have to be aware of it, though: any dashboard is necessarily frozen in time and under some circumstances might be showing you a minority of the activities that would be shown if you re-downloaded later. For example, if you accidentally no-tracked a link, it wouldn't associate people's Munchkin sessions. Then in the future, a tracked link would associate the session and replay all their old activities.

3 replies

DJ_Erraballi
Level 2
June 17, 2020

In regards to question #1, i actually think this behavioru did change with this release: 

https://docs.marketo.com/display/public/DOCS/Release+Notes%3A+June+%2720 either intentionaly or unintentionally. 

SanfordWhiteman
Level 10
June 17, 2020

Activities are continually merged from the Anonymous side into the Known side of the database (and keep their original timestamp). 

DJ_Erraballi
Level 2
June 17, 2020

Thanks for the quick reply. that does answer question #2.

If data is added in an ongoing fashion it does create some challenges for ensuring that exported activities and the activities in Marketo match up, especially if reprocessing time ranges eats away out our daily export quotas. But it does make sense, so i think what i will do is increase the start_at buffer to an hour and hopefully that will suck up enough of a percentage of the difference, without having too large of an impact on the quota.

Any ideas on question #1? 

Thanks for the help!

SanfordWhiteman
Level 10
June 17, 2020

start_at buffer to an hour and hopefully that will suck up enough of a percentage of the difference, without having too large of an impact on the quota.

... except activities can be updated months later.

 

As for #1, no idea yet, but I'll look into it when I can.

DJ_Erraballi
Level 2
June 19, 2020

Also looks like we have some data that is coming back from bulk activity export with null activityDates. This is alittle strange since not only had this never happened before june/5 we are only experiencing this data issue with one of out the 8 marketo instances we work with.

DJ_Erraballi
Level 2
June 19, 2020

Here is the raw activity we saw: 

 

 

'marketoGUID' = {str} 'Education/IntroductionToCorporateCompliance_NonLMS/story.html'
'leadId' = {str} '{"Client IP Address":"REDACTED","Search Engine":"Gmail","Query Parameters":"","Referrer URL":null,"User Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36","Webpage URL":"/
'activityDate' = {NoneType} None
'activityTypeId' = {NoneType} None
'campaignId' = {NoneType} None
'primaryAttributeValueId' = {NoneType} None
'primaryAttributeValue' = {NoneType} None
'attributes' = {NoneType} None

 

If looks almost as if the activity export generated a completely faulty row, all the fields are null, except marketo_guid, which appears to have the primary attribute value, and the leadId which appears to have the attributes.

DJ_Erraballi
Level 2
June 19, 2020

Ok so both those separate activities were actually the same row in the file,

tracked this down to: where the record contains a control k character in the URL which breaks the csv parsing. 

149877905,3173901,2020-06-05T23:08:07Z,1,null,31707,www.multicare.org/photos/^KEducation/IntroductionToCorporateCompliance_NonLMS/story.html,"{""Client IP Address"":""REDACTED"",""Search Engine"":""Gmail"",""Query Parameters"":"""",""Referrer URL"":null,""User Agent"":""Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"",""Webpage URL"":""/photos/\u000BEducation/IntroductionToCorporateCompliance_NonLMS/story.html""}"

149878023,2928924,2020-06-05T23:09:25Z,10,1951,2598,Puget Sound-ES-202005-Essential and Financial Email.Email,"{""Choice Number"":""0"",""Campaign Run ID"":""670"",""Platform"":""Win7"",""Device"":""PC"",""Step ID"":""2530"",""User Agent"":""Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; BRI/1; BRI/2; Zoom 3.6.0; Microsoft Outlook 14.0.7248; ms-office; MSOffice 14)"",""Is Mobile Device"":false}"