URL on a page | Community
Skip to main content
Level 3
October 19, 2023
Solved

URL on a page

  • October 19, 2023
  • 3 replies
  • 3418 views

Team, 

Any suggestions on how to retrieve a list of URL of pdf links hosted on external server and  available on a HTML page in AEM. 

Edit:

Use case - our authors are adding PDF links to their content (using Content Fragment - RTE), and those PDFs are hosted on SharePoint/Teamsite servers. The business team is looking for a way to create a report that shows these links within the rendered HTML. I'd really appreciate any suggestions you might have on how to make this happen!

 

Thanks

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.
Best answer by arunpatidar

Hi @nj2 

This is a really basic example, how you can exact the list but if you are looking for a solution that can be used by Authors then create a landing page with a servlet which return results as EXCEL/CSV

String externaSharepointlLink = "www.sharepoint.com"; String externaPDFpointlLink = "www.pdf.com"; //Query to match text and other type for above string def query = buildQuery(); def result = query.execute(); def buildQuery() { def queryManager = session.workspace.queryManager; def statement = "SELECT * FROM [nt:unstructured] AS text WHERE ISDESCENDANTNODE([/content/dam/cf]) AND (nodetypes RTE or other fields) AND (text.[text] LIKE '%"+externaSharepointlLink+"%' OR text.[text] LIKE '%"+externaPDFpointlLink+"%'") queryManager.createQuery(statement, 'sql'); } if(result.nodes.size()>0){ println qPath+' text components with external links: '+result.nodes.size(); total+=result.nodes.size(); result.nodes.each { node -> println node.path; } }

 

3 replies

Harwinder-singh
Community Advisor
Community Advisor
October 19, 2023

@nj2 If this external service exposes a REST API that you can tap into , you can create an OSGI service that lets fetch pdf links from the external server as a REST call. 

Once you have this data , you can use the same in your page via an AEM component backed by a sling model that has fields that hold this REST call response data.

Some useful content around this : 

 https://experienceleaguecommunities.adobe.com/t5/adobe-experience-manager/how-to-call-3rd-part-rest-api-from-aem-in-server-side-code/m-p/402461

https://medium.com/@codeandtheory/invoke-rest-services-in-aem-the-right-way-c5fb0af43afe

 

nj2Author
Level 3
October 21, 2023

Thanks for your response. I've updated the original question to better match the real situation.

aanchal-sikka
Community Advisor
Community Advisor
October 20, 2023

Hello @nj2 

 

Requesting you to please check, if the javascript code on the link helps.

https://www.datablist.com/learn/scraping/extract-urls-from-webpage

 

It helps extract all URLs from a webpage. You can probably customize it to extract only pdfs.

Aanchal Sikka
arunpatidar
Community Advisor
Community Advisor
October 20, 2023

Hi,

With Groovy script, it is possible.

You check the link components, RTE and looks for external host match.

Arun Patidar
nj2Author
Level 3
October 21, 2023

Thank you for the solution. Could you provide some additional details or elaborate on it further, please?"

aanchal-sikka
Community Advisor
Community Advisor
October 22, 2023

@nj2 

 

Please find the Groovy scripts samples at https://hashimkhan.in/aem-adobecq5-code-templates/groovy-script/

 

Aanchal Sikka