How can I prevent spam leads from entering Marketo? | Community
Skip to main content
Julia_Campbell2
Level 2
July 26, 2019
Solved

How can I prevent spam leads from entering Marketo?

  • July 26, 2019
  • 3 replies
  • 14579 views

Hi folks,

We're starting to experience some spam on our blog forms and are looking for a solve. I've seen articles about ReCaptcha and honeypots, but am not sure that either alone will solve our root issue. I'm hoping there's a combined approach that could solve our issue. We are proactively trying to address our global forms before any potential escalations in spam attack volume.

My understanding (please correct me if I'm wrong) is that the ReCaptcha implemenation found here does not prevent leads from entering Marketo. Instead, the data from the ReCaptcha is webhooked into Marketo and appended to the Lead record. You can then use the data to delete spam leads through a flow.

My understanding is also that honeypot fields are easy for a dedicated spammer to identify (even if they don't have an obvious name) and bypass. That said, this article implies that a honeypot can be used to prevent form submits from even happening - a desired result.

Goal:

Prevent Spam lead data from entering Marketo. This could look like spam leads not being able to submit Marketo forms OR preventing the data from form submits from reaching Marketo.

This is to make sure that:

  • Marketo's API is not impacted by sudden high inbound volume
  • Campaigns, etc do not trigger and impact the API - with the current system setup, they would have to be updated 1 x 1 to filter out leads flagged as spam by ReCaptcha data
  • Prevent system delays in triggers, etc. due to backlog
  • Prevent the need for ongoing system cleansing for spam leads, especially if there is high volume

Is this a viable solution?

  • Implement a hidden simple boolean true/false ReCaptcha field on the Marketo form
  • Include JavaScript similar to the honeypot article linked above, but for the ReCaptcha
  • If an automated spam script fills out the form, including the hidden ReCaptcha field, this will trigger the JavaScript to prevent the form from being able to submit OR filter out the data from ever reaching Marketo
  • Standard non-Spam leads will not need to fill out the ReCaptcha (e.g. if ReCaptcha is TRUE, the lead is Spam) and will pass through to Marketo

If this is not possible, is there some way to use a proxy in tandem with Marketo forms to prevent syncing bad data to the system? Other solutions?

Thanks so much for any help and ideas!

Cheers,

Julia

P.S.@Sanford Whiteman tagging you since I know you've been an invaluable resource on past ReCaptcha questions. 

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.
Best answer by SanfordWhiteman

First, honeypot fields are ridiculous. Worse than useless. Anyone who continues to champion them just doesn't understand how forms work, or how the web (including malicious and legit actors) works in general.

Second, realize that reCAPTCHA never (in any system, not just Marketo) stops non-humans from submitting form data.  It cannot ever do this, because malicious actors do not use JavaScript. reCAPTCHA relies on an end user fingerprint, generated using JS and verified on your server via webhook, to determine whether the submission was from a human or machine (or, in the latest v3, whether they tilt toward human or machine, instead of a binary distinction). 

So reCAPTCHA can be used to intuit whether form data was submitted by human or machine, but it doesn't stop the data from being submitted.

Now, in Marketo, you have the less-than-optimal reality that once form data is submitted, a lead is upserted before any other inspection of the payload can be done. In other systems, the form data can be inspected, again after submission, but before it enters the next stage of processing. In Marketo you can only inspect and attempt to subvert/revert the actions done before you're given a chance to check the reCAPTCHA fingerprint.

When plugging reCAPTCHA into Marketo, you need to tune your workflow so that form intake processes work in serial, not in parallel, so you always have control over the next step. You need to make sure that not just reCAPTCHA, but other prerequisites like an SMTP verify webhook, have a positive outcome before letting people move to the next step (i.e. making robust use of Request Campaign and the Webhook is Called trigger).  I just rolled out a robust reCAPTCHA implementation for a client that was a huge net win, because it taught them a lot about rogue processes they didn't even realize were running, in random order, on every form fill! The end result was a workflow that's (mostly) self-documenting and stops non-human leads from entering the system.

Is this a viable solution?

 

  • Implement a hidden simple boolean true/false ReCaptcha field on the Marketo form
  • Include JavaScript similar to the honeypot article linked above, but for the ReCaptcha
  • If an automated spam script fills out the form, including the hidden ReCaptcha field, this will trigger the JavaScript to prevent the form from being able to submit OR filter out the data from ever reaching Marketo
  • Standard non-Spam leads will not need to fill out the ReCaptcha (e.g. if ReCaptcha is TRUE, the lead is Spam) and will pass through to Marketo

No, this does not make sense. There's no such thing as a reCAPTCHA that operates entirely on the client side, and the last thing you want is a reCAPTCHA that acts more like a honeypot (i.e. is more fake)!

3 replies

SanfordWhiteman
SanfordWhitemanAccepted solution
Level 10
July 26, 2019

First, honeypot fields are ridiculous. Worse than useless. Anyone who continues to champion them just doesn't understand how forms work, or how the web (including malicious and legit actors) works in general.

Second, realize that reCAPTCHA never (in any system, not just Marketo) stops non-humans from submitting form data.  It cannot ever do this, because malicious actors do not use JavaScript. reCAPTCHA relies on an end user fingerprint, generated using JS and verified on your server via webhook, to determine whether the submission was from a human or machine (or, in the latest v3, whether they tilt toward human or machine, instead of a binary distinction). 

So reCAPTCHA can be used to intuit whether form data was submitted by human or machine, but it doesn't stop the data from being submitted.

Now, in Marketo, you have the less-than-optimal reality that once form data is submitted, a lead is upserted before any other inspection of the payload can be done. In other systems, the form data can be inspected, again after submission, but before it enters the next stage of processing. In Marketo you can only inspect and attempt to subvert/revert the actions done before you're given a chance to check the reCAPTCHA fingerprint.

When plugging reCAPTCHA into Marketo, you need to tune your workflow so that form intake processes work in serial, not in parallel, so you always have control over the next step. You need to make sure that not just reCAPTCHA, but other prerequisites like an SMTP verify webhook, have a positive outcome before letting people move to the next step (i.e. making robust use of Request Campaign and the Webhook is Called trigger).  I just rolled out a robust reCAPTCHA implementation for a client that was a huge net win, because it taught them a lot about rogue processes they didn't even realize were running, in random order, on every form fill! The end result was a workflow that's (mostly) self-documenting and stops non-human leads from entering the system.

Is this a viable solution?

 

  • Implement a hidden simple boolean true/false ReCaptcha field on the Marketo form
  • Include JavaScript similar to the honeypot article linked above, but for the ReCaptcha
  • If an automated spam script fills out the form, including the hidden ReCaptcha field, this will trigger the JavaScript to prevent the form from being able to submit OR filter out the data from ever reaching Marketo
  • Standard non-Spam leads will not need to fill out the ReCaptcha (e.g. if ReCaptcha is TRUE, the lead is Spam) and will pass through to Marketo

No, this does not make sense. There's no such thing as a reCAPTCHA that operates entirely on the client side, and the last thing you want is a reCAPTCHA that acts more like a honeypot (i.e. is more fake)!

Julia_Campbell2
Level 2
July 29, 2019

Hi @Sanford Whiteman‌,

Love this - THANK YOU for such a detailed response! We're working on scoping a project to re-architect our Marketo instance to allow for more streamlined processes. Right now we have no control over what's firing when, so we need to start daisy chaining with Request Campaign and Webhook is Called even outside of the ReCaptcha issue. We'll make sure to factor this in as well.

"The end result was a workflow that's (mostly) self-documenting and stops non-human leads from entering the system." 

Is this referring to SFDC or other CRM as "the system" rather than Marketo? If we're able to use the triggers you mentioned in Marketo, my understanding is the form data/lead record should be in Marketo already, but beyond that point we can control how and where the data flows (i.e. not to the CRM). 

If this understanding is correct, and the data is already in Marketo, then it sounds like there's no way to prevent the form data from entering Marketo when we're using Marketo forms on landing pages? Aka our only option to prevent an attack from reaching our systems in the first place would be to use non-Marketo forms so the data can be inspected before pushing into Marketo?

SanfordWhiteman
Level 10
July 29, 2019
"The end result was a workflow that's (mostly) self-documenting and stops non-human leads from entering the system." 

 

Is this referring to SFDC or other CRM as "the system" rather than Marketo?

Yes, but a better phrasing would've been "stops non-human leads from entering CRM, and stops them from entering any additional Marketo flows if they do not first pass verification."  The client in this case quarantines the flagged leads and deletes them using a nightly batch.

Pratyusha_Ram1
Level 2
July 29, 2019

We faced this issue multiple times over the last couple of months. Here's what we observed - initially, we saw a surge of 10-15k leads per day and these ended up as fake handraisers. This is what alerted us as we saw a sudden spike in the daily MQL reports.  

Now, it's natural and that we looped in our Digital marketing team as the leads seem to be sourced from a particular form. They jumped right in with many resolutions -

  • Expansion of the existing honeypot solution 
  • reCaptcha
  • IP and email domain blocking on the web forms 

But, nothing seemed to stop the incoming spam leads. This is when we went on to compare the Google Analytics stats and noticed that the incoming web page hits didn't match the # of incoming spam leads. This arose a suspicion that the spam records could directly be hitting the MKTO endpoint. 

We now have 

  • a daily report of the incoming leads matching the spam leads criteria
  • a campaign to mark these incoming spam leads as invalid, so they are not processed and progressed to the next stages
  • an ongoing effort to delete these leads - not just from MKTO, but the integrated systems as well 

But, this is not viable. So, we got on a call with MKTO team to discuss the issue and check if IP blocking or anything was possible. But, apparently not. They told us that a long-term solution is being implemented and will be rolled out in Q1 2020 (tentative :-(, this was 2 months ago!). They recommended that we delete the affected form and create a new one, but this is not going to make the system any less vulnerable. It takes a few seconds to try different form IDs as two are already exposed. 

We are actively in discussion with MKTO CSM and Products team. Let me know if you'd like to discuss the details and I'm happy to jump on a call!  

SanfordWhiteman
Level 10
July 29, 2019

You can't have implemented reCAPTCHA correctly — simple as that. Properly integrated into your flow, all reCAPTCHA-failing leads will never be flagged as handraisers.

And as noted above, honeypot has never worked against an even mildly savvy attacker, so any attempt to "extend" it wouldn't do anything.

Pratyusha_Ram1
Level 2
July 30, 2019

Thanks Sanford Whiteman‌. This makes sense when the attacks are happening on a web form directly. How do you handle a scenario where the endpoint might be exposed? Spam leads might be going straight past the reCaptcha/ or the form. 

Sreekanth_Reddy
Adobe Employee
Adobe Employee
July 30, 2019

Hi Guys, I am Sreekanth from Product team at Marketo.

Sanford Whiteman - Great insights there. Would love to know your thoughts around how Marketo can help here (can possibly also be using some AI/ML capabilities). 

@Pratyusha Ram @Julia Campbell  - Would love to know more details on your current plans/implementation to tackle this scenario. 

Happy to connect with everyone. If you would like to connect, please send a note at sreekanth.reddy@adobe.com

SanfordWhiteman
Level 10
July 30, 2019

No additional AI/ML is necessary -- reCAPTCHA already is built on machine learning!

What you need is way for Marketo users to plug in their reCAPTCHA keypair (generated in their own Google reCAPTCHA Console) and have the system validate the user response (client fingerprint) before posting the form data to Marketo, using the underlying HTTP stack directly rather than a user-defined webhook.

This is considerably complex to do correctly, because you must give each user control over what return value -- esp. in reCAPTCHA v3, which returns a confidence level rather than a hard binary bot/not -- is enough to delete the form data entirely.  You need to offer a training mode, where you only tag inbound leads with their reCAPTCHA result, not delete them. And you need to also let the user audit reCAPTCHA results over time. Remember, it's not just one "reCAPTCHA" score because the same email address can be associated with multiple sessions and results.

If you pivoted and simply built a "pre-database webhook" functionality instead, without draconian rate limits, and allowed us to discard the data based on the response (so it never entered the db) that would be sufficient.