Team,
Any suggestions on how to retrieve a list of URL of pdf links hosted on external server and available on a HTML page in AEM.
Edit:
Use case - our authors are adding PDF links to their content (using Content Fragment - RTE), and those PDFs are hosted on SharePoint/Teamsite servers. The business team is looking for a way to create a report that shows these links within the rendered HTML. I'd really appreciate any suggestions you might have on how to make this happen!
Thanks
Solved! Go to Solution.
Views
Replies
Total Likes
Hi @nj2
This is a really basic example, how you can exact the list but if you are looking for a solution that can be used by Authors then create a landing page with a servlet which return results as EXCEL/CSV
String externaSharepointlLink = "www.sharepoint.com";
String externaPDFpointlLink = "www.pdf.com";
//Query to match text and other type for above string
def query = buildQuery();
def result = query.execute();
def buildQuery() {
def queryManager = session.workspace.queryManager;
def statement = "SELECT * FROM [nt:unstructured] AS text WHERE ISDESCENDANTNODE([/content/dam/cf]) AND (nodetypes RTE or other fields) AND (text.[text] LIKE '%"+externaSharepointlLink+"%' OR text.[text] LIKE '%"+externaPDFpointlLink+"%'")
queryManager.createQuery(statement, 'sql');
}
if(result.nodes.size()>0){
println qPath+' text components with external links: '+result.nodes.size();
total+=result.nodes.size();
result.nodes.each { node ->
println node.path;
}
}
@nj2 If this external service exposes a REST API that you can tap into , you can create an OSGI service that lets fetch pdf links from the external server as a REST call.
Once you have this data , you can use the same in your page via an AEM component backed by a sling model that has fields that hold this REST call response data.
Some useful content around this :
https://medium.com/@codeandtheory/invoke-rest-services-in-aem-the-right-way-c5fb0af43afe
Thanks for your response. I've updated the original question to better match the real situation.
Hello @nj2
Requesting you to please check, if the javascript code on the link helps.
https://www.datablist.com/learn/scraping/extract-urls-from-webpage
It helps extract all URLs from a webpage. You can probably customize it to extract only pdfs.
Hi,
With Groovy script, it is possible.
You check the link components, RTE and looks for external host match.
Thank you for the solution. Could you provide some additional details or elaborate on it further, please?"
Please find the Groovy scripts samples at https://hashimkhan.in/aem-adobecq5-code-templates/groovy-script/
Thank you for your quick response. However, I am already familiar with the Grovy script. My initial question pertained to the solution approach proposed by @arunpatidar , which I didn't quite understand initially.
Hi @nj2
This is a really basic example, how you can exact the list but if you are looking for a solution that can be used by Authors then create a landing page with a servlet which return results as EXCEL/CSV
String externaSharepointlLink = "www.sharepoint.com";
String externaPDFpointlLink = "www.pdf.com";
//Query to match text and other type for above string
def query = buildQuery();
def result = query.execute();
def buildQuery() {
def queryManager = session.workspace.queryManager;
def statement = "SELECT * FROM [nt:unstructured] AS text WHERE ISDESCENDANTNODE([/content/dam/cf]) AND (nodetypes RTE or other fields) AND (text.[text] LIKE '%"+externaSharepointlLink+"%' OR text.[text] LIKE '%"+externaPDFpointlLink+"%'")
queryManager.createQuery(statement, 'sql');
}
if(result.nodes.size()>0){
println qPath+' text components with external links: '+result.nodes.size();
total+=result.nodes.size();
result.nodes.each { node ->
println node.path;
}
}