Expand my Community achievements bar.

Scan HTML content for URLs and cURL them

Avatar

Level 2

Hi

I'm working on a solution that would scan planned deliveries for URLs (in their HTML Source) and cURL those URLs and check their HTTP Response.

Goal is to avoid sending campaigns with broken URLs.

Now, I have the 2nd bit (cURLing) complete, however I cannot figure out how to obtain all the URLs (and img src URLs) for the cURL activity.

I've tried RegEx to check the [content/html/source] in nms:delivery and was working but it's not a recommended solution (safety wise) and for more complicated HTMLs it was malfunctioning.

How can I achieve this with javascript? Anybody else built similar solution?

6 Replies

Avatar

Level 10

Hi Szymons,

Of course you could manage to use content/html/source and regex expression in Javascript (as Jon Wodnicki explained recently for other purposes).

But the easy way is to use the xtk:trackingUrl schema with workflow query activity or Javascript activity (or underlying table in SQL for direct SQL queries) to achieve the same.

Regards

J-Serge

Avatar

Level 2

Hi Jean

Thanks for answer.

Doesn't the xtk:trackingUrl schema contain only entries from already sent deliveries? I wanted to check it prior to sending.

Avatar

Level 10

No, this table contains all url for delivery "published" (through the delivery button Tracking & Images).

URL are available until the purge limit (global instance option value) or per delivery, you can define the resources period of validity, and so for url contents it is written in nms:trackingUrl:tsValidity.


Please also have a look on nms:trackingUrlInfo for common fields of nms:trackingUrl.

In order to select only deliveries sent (being in or progress or completed or stopped etc), you must use the nms:delivery properties/status in your workflow activities (JS with queryDef or direct Query).

Avatar

Level 2

Jean,

could you explain/show please how do I achieve that "Published" status?

I created a test delivery with some HTML content (links included), but I cannot make it appear in the xtk:trackingUrl schema.

Avatar

Level 10

Sorry for published status, I misleaded you, my bad, I am not English native.

Please use the nms:delivery.@status or other delivery fields depending of your goal.

When you use the Tracking & Images button, it displays the content of the trackingUrl table for a delivery:

1698559_pastedImage_0.png

Avatar

Level 2

My delivery is in status "Being edited" and has links in the tab you're showing, however still cannot find it in the TrackingUrl schema. Any idea why it isn't appearing?