I've got a request to replace instances of internal 'http' urls to 'https' inside rich text components.
So I'm using a SQL2 query using CONTAINS to find all nodes that have the text "http://%mysite.com" inside the text property
SELECT * FROM [nt:unstructured] AS node
WHERE ISDESCENDANTNODE('/content/mysite')
AND CONTAINS(node.text, 'http://%mysite.com')
This is working but is also matching cases where I have a external link followed by the text "mysite.com" further down.
For example:
"Lorem ipsum dolor sit amet, consectetur adipiscing elit <a href="http://mysite.com">internal link</a> Duis viverra orci a ullamcorper mattis." - MATCHES
"Lorem ipsum dolor sit amet, consectetur adipiscing elit <a href="http://wikipedia.com">external link</a> Duis viverra orci a ullamcorper <a href="http://mysite.com">internal link</a> mattis." - ALSO MATCHES
"Lorem ipsum dolor sit amet, consectetur adipiscing elit <a href="http://wikipedia.com">external link</a> Duis viverra orci a ullamcorper mysite.com mattis." - ALSO MATCHES
So I would like to know if there is a way to modify the LIKE pattern or use REGEX instead to only find internal urls inside a long string.
Solved! Go to Solution.
Topics help categorize Community content and increase your ability to discover relevant content.
Views
Replies
Total Likes
Hi @csamaniego97 since your query use CONTAINS(node.text, 'http://%mysite.com') it is matching the external links as well because it literally means http://..(anything)...mysite.com i.e http:// from the external link part then anything then mysite.com text.
For simpler approach you can use:
SELECT * FROM [nt:unstructured] AS node
WHERE ISDESCENDANTNODE('/content/mysite')
AND CONTAINS(node.text, '<a href="http://%mysite\.com[^"]*')
You can use a more specified regular expression like this:
SELECT * FROM [nt:unstructured] AS node
WHERE ISDESCENDANTNODE('/content/mysite')
AND REGEX_CONTAINS(node.text, '(?i)http://([^"]+\.)?mysite\.com[^<]*', 'm')
or even more specific
SELECT * FROM [nt:unstructured] AS node
WHERE ISDESCENDANTNODE('/content/mysite')
AND REGEX_CONTAINS(node.text, '(?i)http://(?:([^"]+\.)?mysite\.com|(?:[^<]+))(?<=href=")', 'm')
You can also try using the LIKE with a wildcard operator:
SELECT * FROM [nt:unstructured] AS node
WHERE ISDESCENDANTNODE('/content/mysite')
AND node.text LIKE '%http://%mysite.com%[^<]*%'
or even more specific
SELECT * FROM [nt:unstructured] AS node
WHERE ISDESCENDANTNODE('/content/mysite')
AND node.text LIKE '%<a href="http://%[^<]*mysite\.com[^"]*%[^>]*%'
Hi @csamaniego97 since your query use CONTAINS(node.text, 'http://%mysite.com') it is matching the external links as well because it literally means http://..(anything)...mysite.com i.e http:// from the external link part then anything then mysite.com text.
For simpler approach you can use:
SELECT * FROM [nt:unstructured] AS node
WHERE ISDESCENDANTNODE('/content/mysite')
AND CONTAINS(node.text, '<a href="http://%mysite\.com[^"]*')
You can use a more specified regular expression like this:
SELECT * FROM [nt:unstructured] AS node
WHERE ISDESCENDANTNODE('/content/mysite')
AND REGEX_CONTAINS(node.text, '(?i)http://([^"]+\.)?mysite\.com[^<]*', 'm')
or even more specific
SELECT * FROM [nt:unstructured] AS node
WHERE ISDESCENDANTNODE('/content/mysite')
AND REGEX_CONTAINS(node.text, '(?i)http://(?:([^"]+\.)?mysite\.com|(?:[^<]+))(?<=href=")', 'm')
You can also try using the LIKE with a wildcard operator:
SELECT * FROM [nt:unstructured] AS node
WHERE ISDESCENDANTNODE('/content/mysite')
AND node.text LIKE '%http://%mysite.com%[^<]*%'
or even more specific
SELECT * FROM [nt:unstructured] AS node
WHERE ISDESCENDANTNODE('/content/mysite')
AND node.text LIKE '%<a href="http://%[^<]*mysite\.com[^"]*%[^>]*%'
Views
Likes
Replies