Classification Rule Builder - Regular Expression Help | Community
Skip to main content
Level 3
October 11, 2023
Solved

Classification Rule Builder - Regular Expression Help

  • October 11, 2023
  • 1 reply
  • 947 views

I am updating an existing implementation that uses a colon delimiter. One of the fields we have is a text field which captures the texts of various items some of which are article names. The issue is some article titles have a colon which is throwing off the existing rule.

 

Here is the expression I am using. 

(?i)^(.*?):(.*):(.*):(.*)(?-i)

 

 

What I am trying to achieve is the following:

 

$0Trending Articles 123:Link:Emerging Issues Executive Quarterly: Insurance in the Inflationary Era:Undefined 

To be: (note the colon on $3)

 
$1Trending Articles 123
$2Link:
$3Emerging Issues Executive Quarterly: Insurance in the Inflationary Era
$4Undefined 
 
It is currently returning: 
$0Trending Articles 123:Link:Emerging Issues Executive Quarterly: Insurance in the Inflationary Era:
$1Trending Articles 123
$2Link:Emerging Issues Executive Quarterly
$3 Insurance in the Inflationary Era
$4
 
any help to modify the regular expression is welcome - (?i)^(.*?):(.*):(.*):(.*)(?-i)
This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.
Best answer by Jennifer_Dungan

Hi,

 

I am not sure what the (?i) at the beginning or (?-i) at the end are supposed to be doing in your Regex... when I used this in an online regex tester, those both came back as invalid.

 

However, using the string

"Trending Articles 123:Link:Emerging Issues Executive Quarterly: Insurance in the Inflationary Era:"

 

with this regex:

^(.*?):(.*?):(.*):(.*)$

 

results in:

  • $0 - Trending Articles 123:Link:Emerging Issues Executive Quarterly: Insurance in the Inflationary Era:
  • $1 - Trending Articles 123
  • $2 - Link
  • $3 - Emerging Issues Executive Quarterly: Insurance in the Inflationary Era
  • $4 - ""

 

Basically, (.*?) in the regex means "match any character except line breaks" / "match 0 or more characters" / "lazy qualifier - match as little as possible"

 

Which in your original regex (minus the odd things), meant that the first instance of colon in your string would force the extracted group to stop...

 

But on the next part (Link), it wasn't using the lazy designation... and because you actually had 1 too many colons in the text, this group took on the extra values.... I just made the second group also lazy, so that the break would occur after "Link", but the next part (the article title with the extra characters) should now take on ALL extra colons until the last one in the string, which will then be your part 4.

 

I hope this helps

1 reply

Jennifer_Dungan
Community Advisor and Adobe Champion
Jennifer_DunganCommunity Advisor and Adobe ChampionAccepted solution
Community Advisor and Adobe Champion
October 11, 2023

Hi,

 

I am not sure what the (?i) at the beginning or (?-i) at the end are supposed to be doing in your Regex... when I used this in an online regex tester, those both came back as invalid.

 

However, using the string

"Trending Articles 123:Link:Emerging Issues Executive Quarterly: Insurance in the Inflationary Era:"

 

with this regex:

^(.*?):(.*?):(.*):(.*)$

 

results in:

  • $0 - Trending Articles 123:Link:Emerging Issues Executive Quarterly: Insurance in the Inflationary Era:
  • $1 - Trending Articles 123
  • $2 - Link
  • $3 - Emerging Issues Executive Quarterly: Insurance in the Inflationary Era
  • $4 - ""

 

Basically, (.*?) in the regex means "match any character except line breaks" / "match 0 or more characters" / "lazy qualifier - match as little as possible"

 

Which in your original regex (minus the odd things), meant that the first instance of colon in your string would force the extracted group to stop...

 

But on the next part (Link), it wasn't using the lazy designation... and because you actually had 1 too many colons in the text, this group took on the extra values.... I just made the second group also lazy, so that the break would occur after "Link", but the next part (the article title with the extra characters) should now take on ALL extra colons until the last one in the string, which will then be your part 4.

 

I hope this helps

Level 3
October 12, 2023

Not only did this help, it also addressed some issues that I had not yet discovered. Kudos!

Jennifer_Dungan
Community Advisor and Adobe Champion
Community Advisor and Adobe Champion
October 12, 2023

So glad that helped you. 

 

FYI, fort he record, I like to use http://www.regextester.com/ to test all my regex rules before I even start building in Adobe's Rule Builder...

 

This allows me to test multiple examples (you have to turn on multi-line) to see how the rule is shaping up.. then you can build the rule with the one sample, confirm the groups that Adobe shows, and then do the final Test Rules with multiple samples again.