Expand my Community achievements bar.

Webinar: Adobe Customer Journey Analytics Product Innovations: A Quarterly Overview. Come learn for the Adobe Analytics Product team who will be covering AJO reporting, Graph-based Stitching, guided analysis for CJA, and more!
SOLVED

Classification Rule Builder - Regular Expression Help

Avatar

Level 4

I am updating an existing implementation that uses a colon delimiter. One of the fields we have is a text field which captures the texts of various items some of which are article names. The issue is some article titles have a colon which is throwing off the existing rule.

 

Here is the expression I am using. 

(?i)^(.*?):(.*):(.*):(.*)(?-i)

 

Yohan_khan00_0-1697039032414.png

 

What I am trying to achieve is the following:

 

$0Trending Articles 123:Link:Emerging Issues Executive Quarterly: Insurance in the Inflationary Era:Undefined 

To be: (note the colon on $3)

 
$1Trending Articles 123
$2Link:
$3Emerging Issues Executive Quarterly: Insurance in the Inflationary Era
$4Undefined 
 
It is currently returning: 
$0Trending Articles 123:Link:Emerging Issues Executive Quarterly: Insurance in the Inflationary Era:
$1Trending Articles 123
$2Link:Emerging Issues Executive Quarterly
$3 Insurance in the Inflationary Era
$4
 
any help to modify the regular expression is welcome - (?i)^(.*?):(.*):(.*):(.*)(?-i)
1 Accepted Solution

Avatar

Correct answer by
Community Advisor

Hi,

 

I am not sure what the (?i) at the beginning or (?-i) at the end are supposed to be doing in your Regex... when I used this in an online regex tester, those both came back as invalid.

 

However, using the string

"Trending Articles 123:Link:Emerging Issues Executive Quarterly: Insurance in the Inflationary Era:"

 

with this regex:

^(.*?):(.*?):(.*):(.*)$

 

results in:

  • $0 - Trending Articles 123:Link:Emerging Issues Executive Quarterly: Insurance in the Inflationary Era:
  • $1 - Trending Articles 123
  • $2 - Link
  • $3 - Emerging Issues Executive Quarterly: Insurance in the Inflationary Era
  • $4 - ""

 

Basically, (.*?) in the regex means "match any character except line breaks" / "match 0 or more characters" / "lazy qualifier - match as little as possible"

 

Which in your original regex (minus the odd things), meant that the first instance of colon in your string would force the extracted group to stop...

 

But on the next part (Link), it wasn't using the lazy designation... and because you actually had 1 too many colons in the text, this group took on the extra values.... I just made the second group also lazy, so that the break would occur after "Link", but the next part (the article title with the extra characters) should now take on ALL extra colons until the last one in the string, which will then be your part 4.

 

I hope this helps

View solution in original post

3 Replies

Avatar

Correct answer by
Community Advisor

Hi,

 

I am not sure what the (?i) at the beginning or (?-i) at the end are supposed to be doing in your Regex... when I used this in an online regex tester, those both came back as invalid.

 

However, using the string

"Trending Articles 123:Link:Emerging Issues Executive Quarterly: Insurance in the Inflationary Era:"

 

with this regex:

^(.*?):(.*?):(.*):(.*)$

 

results in:

  • $0 - Trending Articles 123:Link:Emerging Issues Executive Quarterly: Insurance in the Inflationary Era:
  • $1 - Trending Articles 123
  • $2 - Link
  • $3 - Emerging Issues Executive Quarterly: Insurance in the Inflationary Era
  • $4 - ""

 

Basically, (.*?) in the regex means "match any character except line breaks" / "match 0 or more characters" / "lazy qualifier - match as little as possible"

 

Which in your original regex (minus the odd things), meant that the first instance of colon in your string would force the extracted group to stop...

 

But on the next part (Link), it wasn't using the lazy designation... and because you actually had 1 too many colons in the text, this group took on the extra values.... I just made the second group also lazy, so that the break would occur after "Link", but the next part (the article title with the extra characters) should now take on ALL extra colons until the last one in the string, which will then be your part 4.

 

I hope this helps

Avatar

Level 4

Not only did this help, it also addressed some issues that I had not yet discovered. Kudos!

Avatar

Community Advisor

So glad that helped you. 

 

FYI, fort he record, I like to use http://www.regextester.com/ to test all my regex rules before I even start building in Adobe's Rule Builder...

 

This allows me to test multiple examples (you have to turn on multi-line) to see how the rule is shaping up.. then you can build the rule with the one sample, confirm the groups that Adobe shows, and then do the final Test Rules with multiple samples again.