Query to validate valid emails. | Community
Skip to main content
July 30, 2024
Solved

Query to validate valid emails.

  • July 30, 2024
  • 4 replies
  • 1348 views

I am creating a query to obtain a list of emails from a dataset that is contaminated with invalid email formats, I have a query with many filters but there are many that cannot obtain the filters and when using the programmed query it generates an error. ingestion into the dataset. (Format validation for field :personalEmail.address of type: String failed. The input value is daimer100@gmail.conm and expected format was: email. Please update the field and try again.) The list of errors is quite extensive and varied, not I see it viable to do this through query filters, how else could I do this query?

QUERY:

WITH valid_emails AS ( SELECT uuid() AS _id, CASE WHEN identityMap['email'][0]['id'] LIKE '%_@__%.__%' AND identityMap['email'][0]['id'] ~ '^[a-zA-Z0-9.!#$%&''*+/=?^_`{|}~-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}$' -- Validación RFC2822 AND NOT (identityMap['email'][0]['id'] LIKE '%..%') -- No contiene puntos consecutivos AND NOT (identityMap['email'][0]['id'] LIKE '%.@%') -- No contiene un punto antes de la arroba AND NOT (identityMap['email'][0]['id'] LIKE '%.%@%') -- No contiene un punto antes o después de la arroba AND NOT (identityMap['email'][0]['id'] LIKE '%@%.') -- No contiene un punto después de la arroba AND NOT (identityMap['email'][0]['id'] LIKE '%.con') -- No termina en .con AND NOT (identityMap['email'][0]['id'] LIKE '%/%') -- No contiene una barra inclinada THEN named_struct('address', identityMap['email'][0]['id']) WHEN identityMap['email'][1]['id'] LIKE '%_@__%.__%' AND identityMap['email'][1]['id'] ~ '^[a-zA-Z0-9.!#$%&''*+/=?^_`{|}~-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}$' -- Validación RFC2822 AND NOT (identityMap['email'][1]['id'] LIKE '%..%') -- No contiene puntos consecutivos AND NOT (identityMap['email'][1]['id'] LIKE '%.@%') -- No contiene un punto antes de la arroba AND NOT (identityMap['email'][1]['id'] LIKE '%.%@%') -- No contiene un punto antes o después de la arroba AND NOT (identityMap['email'][1]['id'] LIKE '%@%.') -- No contiene un punto después de la arroba AND NOT (identityMap['email'][1]['id'] LIKE '%.con') -- No termina en .con AND NOT (identityMap['email'][1]['id'] LIKE '%/%') -- No contiene una barra inclinada THEN named_struct('address', identityMap['email'][1]['id']) WHEN identityMap['email'][2]['id'] LIKE '%_@__%.__%' AND identityMap['email'][2]['id'] ~ '^[a-zA-Z0-9.!#$%&''*+/=?^_`{|}~-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}$' -- Validación RFC2822 AND NOT (identityMap['email'][2]['id'] LIKE '%..%') -- No contiene puntos consecutivos AND NOT (identityMap['email'][2]['id'] LIKE '%.@%') -- No contiene un punto antes de la arroba AND NOT (identityMap['email'][2]['id'] LIKE '%.%@%') -- No contiene un punto antes o después de la arroba AND NOT (identityMap['email'][2]['id'] LIKE '%@%.') -- No contiene un punto después de la arroba AND NOT (identityMap['email'][2]['id'] LIKE '%.con') -- No termina en .con AND NOT (identityMap['email'][2]['id'] LIKE '%/%') -- No contiene una barra inclinada THEN named_struct('address', identityMap['email'][2]['id']) -- Agrega más condiciones si hay más posiciones ELSE NULL END AS personalEmail FROM profile_snapshot_export_*profile_snapshot_number* WHERE segmentmembership['ups']['*segmentmembership NUMBER*'].status IN ('realized', 'existing') AND (identityMap['email'][0]['id'] IS NOT NULL OR identityMap['email'][1]['id'] IS NOT NULL OR identityMap['email'][2]['id'] IS NOT NULL -- Agrega más condiciones si hay más posiciones ) ) SELECT _id, personalEmail FROM valid_emails WHERE personalEmail IS NOT null;



This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.
Best answer by DavidKangni

@juan_sebastianqu 

Adobe is using IANA TLD to validate and insert email in AJO when using the personalEmail.address

 

TLD stands for Top Level Domain commonly known as extension .com .info .org etc

 

During the insertion, it's checking the format (email regex or %@%.% to make it easier) and then the extension. If the extension is not in IANA TLD then your email will fail the validation e.g .conm

 

To reduce the failures, I will suggest you to implement a email regex validation at the collection point.

 

Thanks,

David

 

4 replies

Pradeep_Kumar_Srivastav
Community Advisor
Community Advisor
July 31, 2024

Hi @juan_sebastianqu ,

 

Well, I tried with a complex regex pattern and it failed. In many SQL implementations,  especially those used in services like Adobe Experience Platform (AEP), more complex/advanced regex functionality might not be supported directly in queries. We will need to use basic SQL pattern matching with LIKE 
We have to keep adding multiple patterns to get all invalid email formats instead of complex regEx like - email !~* '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'

 

SatheeskannaK
Community Advisor
Community Advisor
July 31, 2024

@juan_sebastianqu I would suggest consider creating a query to retrieve emails from a dataset excluding domains such as "gmail.com" and "yahoo.com". The criteria should focus on identifying fewer valid domains in comparison to the N number of invalid domains.

Thanks, Sathees
DavidKangni
Community Advisor
DavidKangniCommunity AdvisorAccepted solution
Community Advisor
July 31, 2024

@juan_sebastianqu 

Adobe is using IANA TLD to validate and insert email in AJO when using the personalEmail.address

 

TLD stands for Top Level Domain commonly known as extension .com .info .org etc

 

During the insertion, it's checking the format (email regex or %@%.% to make it easier) and then the extension. If the extension is not in IANA TLD then your email will fail the validation e.g .conm

 

To reduce the failures, I will suggest you to implement a email regex validation at the collection point.

 

Thanks,

David

 

David Kangni
Level 2
March 19, 2025

so is it enough if we just check the domains of the email, and verify whether it falls under this domain or not.???

Sukrity_Wadhwa
Community Manager
Community Manager
August 12, 2024

Hi @juan_sebastianqu,

Were you able to resolve this query with the help of the given solutions or do you still need more help here? Do let us know. In case the given solutions were helpful, then kindly choose the one that helped you the most as the 'Correct Reply'.
Thanks!

Sukrity Wadhwa