Classification Rule Builder - Regular Expression Help

Question

I am updating an existing implementation that uses a colon delimiter. One of the fields we have is a text field which captures the texts of various items some of which are article names. The issue is some article titles have a colon which is throwing off the existing rule.

Here is the expression I am using.

(?i)^(.*?):(.*):(.*):(.*)(?-i)

What I am trying to achieve is the following:

$0Trending Articles 123:Link:Emerging Issues Executive Quarterly: Insurance in the Inflationary Era:Undefined

To be: (note the colon on $3)

$1Trending Articles 123

$2Link:

$3Emerging Issues Executive Quarterly: Insurance in the Inflationary Era

$4Undefined

It is currently returning:

$0Trending Articles 123:Link:Emerging Issues Executive Quarterly: Insurance in the Inflationary Era:

$1Trending Articles 123

$2Link:Emerging Issues Executive Quarterly

$3 Insurance in the Inflationary Era

$4

any help to modify the regular expression is welcome - (?i)^(.*?):(.*):(.*):(.*)(?-i)

Jennifer_Dungan · Accepted Answer

Hi,

I am not sure what the (?i) at the beginning or (?-i) at the end are supposed to be doing in your Regex... when I used this in an online regex tester, those both came back as invalid.

However, using the string

"Trending Articles 123:Link:Emerging Issues Executive Quarterly: Insurance in the Inflationary Era:"

with this regex:

^(.*?):(.*?):(.*):(.*)$

results in:

$0 - Trending Articles 123:Link:Emerging Issues Executive Quarterly: Insurance in the Inflationary Era:
$1 - Trending Articles 123
$2 - Link
$3 - Emerging Issues Executive Quarterly: Insurance in the Inflationary Era
$4 - ""

Basically, (.*?) in the regex means "match any character except line breaks" / "match 0 or more characters" / "lazy qualifier - match as little as possible"

Which in your original regex (minus the odd things), meant that the first instance of colon in your string would force the extracted group to stop...

But on the next part (Link), it wasn't using the lazy designation... and because you actually had 1 too many colons in the text, this group took on the extra values.... I just made the second group also lazy, so that the break would occur after "Link", but the next part (the article title with the extra characters) should now take on ALL extra colons until the last one in the string, which will then be your part 4.

I hope this helps

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded