Adobe Experience Manager Sites & More

nj2 · 10/19/23

Team,

Any suggestions on how to retrieve a list of URL of pdf links hosted on external server and available on a HTML page in AEM.

Edit:

Use case - our authors are adding PDF links to their content (using Content Fragment - RTE), and those PDFs are hosted on SharePoint/Teamsite servers. The business team is looking for a way to create a report that shows these links within the rendered HTML. I'd really appreciate any suggestions you might have on how to make this happen!

Thanks

arunpatidar · 10/23/23

Hi @nj2

This is a really basic example, how you can exact the list but if you are looking for a solution that can be used by Authors then create a landing page with a servlet which return results as EXCEL/CSV

String externaSharepointlLink = "www.sharepoint.com";
String externaPDFpointlLink = "www.pdf.com";

//Query to match text and other type for above string

 def query = buildQuery();
 def result = query.execute();


def buildQuery() {
  def queryManager = session.workspace.queryManager;
  def statement = "SELECT * FROM [nt:unstructured] AS text WHERE ISDESCENDANTNODE([/content/dam/cf]) AND (nodetypes RTE or other fields) AND (text.[text] LIKE '%"+externaSharepointlLink+"%' OR text.[text] LIKE '%"+externaPDFpointlLink+"%'")
  queryManager.createQuery(statement, 'sql');
}


if(result.nodes.size()>0){
    println qPath+' text components with external links: '+result.nodes.size(); 
    total+=result.nodes.size();
    result.nodes.each { node ->
         println node.path;
    }
 }

Arun Patidar

View solution in original post

Harwinder-singh · 10/19/23

@nj2 If this external service exposes a REST API that you can tap into , you can create an OSGI service that lets fetch pdf links from the external server as a REST call.

Once you have this data , you can use the same in your page via an AEM component backed by a sling model that has fields that hold this REST call response data.

Some useful content around this :

https://experienceleaguecommunities.adobe.com/t5/adobe-experience-manager/how-to-call-3rd-part-rest-...

https://medium.com/@codeandtheory/invoke-rest-services-in-aem-the-right-way-c5fb0af43afe

nj2 · 10/21/23

Thanks for your response. I've updated the original question to better match the real situation.

aanchal-sikka · 10/19/23

Hello @nj2

Requesting you to please check, if the javascript code on the link helps.

https://www.datablist.com/learn/scraping/extract-urls-from-webpage

It helps extract all URLs from a webpage. You can probably customize it to extract only pdfs.

Aanchal Sikka

arunpatidar · 10/20/23

Hi,

With Groovy script, it is possible.

You check the link components, RTE and looks for external host match.

Arun Patidar

nj2 · 10/21/23

Thank you for the solution. Could you provide some additional details or elaborate on it further, please?"

aanchal-sikka · 10/21/23

@nj2

Please find the Groovy scripts samples at https://hashimkhan.in/aem-adobecq5-code-templates/groovy-script/

Aanchal Sikka

nj2 · 10/22/23

Thank you for your quick response. However, I am already familiar with the Grovy script. My initial question pertained to the solution approach proposed by @arunpatidar , which I didn't quite understand initially.

arunpatidar · 10/23/23

Hi @nj2

This is a really basic example, how you can exact the list but if you are looking for a solution that can be used by Authors then create a landing page with a servlet which return results as EXCEL/CSV

String externaSharepointlLink = "www.sharepoint.com";
String externaPDFpointlLink = "www.pdf.com";

//Query to match text and other type for above string

 def query = buildQuery();
 def result = query.execute();


def buildQuery() {
  def queryManager = session.workspace.queryManager;
  def statement = "SELECT * FROM [nt:unstructured] AS text WHERE ISDESCENDANTNODE([/content/dam/cf]) AND (nodetypes RTE or other fields) AND (text.[text] LIKE '%"+externaSharepointlLink+"%' OR text.[text] LIKE '%"+externaPDFpointlLink+"%'")
  queryManager.createQuery(statement, 'sql');
}


if(result.nodes.size()>0){
    println qPath+' text components with external links: '+result.nodes.size(); 
    total+=result.nodes.size();
    result.nodes.each { node ->
         println node.path;
    }
 }

Adobe Experience Manager Sites & More

URL on a page

Arun Patidar

Arun Patidar

Arun Patidar

Learn

Documentation

Community

Support

Resources

Adobe account

Adobe