Expand my Community achievements bar.

Issue Fetching Report Data from /etc/reports/diskusage.html via Code

Avatar

Level 2

Hello All,

We are working on a project where we need to fetch the disk usage report from the following URL:
/etc/reports/diskusage.html
Once fetched, we need to share this report via email to clients on a daily basis.

We tried to achieve this by utilizing a service, scheduler, and JavaScript functionality. Below is the Java code we used to attempt fetching the data:

 

try (CloseableHttpResponse reportResponse = httpClient.execute(new HttpGet("http://localhost:4502/etc/reports/diskusage.html"))) {
    if (reportResponse.getStatusLine().getStatusCode() == 200) {
        String reportContent = EntityUtils.toString(reportResponse.getEntity());
        Document doc = Jsoup.parse(reportContent);
        Element restable = doc.getElementById("restable");
        if (restable != null && !restable.text().isEmpty()) {
            isReportGenerated = true;
            String reportData = restable.html();
            logger.info("Report content captured successfully.{}", reportData);
        }
    }
}

 



Issue:

  • When we manually visit the URL in the browser, we are able to see the disk usage report.
  • However, when accessing the page programmatically through our code, the restable element is fetched, but its content is empty.

Attempts Made:

  • Added Thread.sleep in the backend to account for any delay.
  • Tried using MutationObserver and window.onload in the frontend.
  • Despite these attempts, we continue to get an empty restable content when trying to fetch it programmatically.

    Has anyone faced a similar issue? Is there a specific way to handle the fetching of dynamic content like this via HTTP requests? Any guidance or suggestions would be greatly appreciated!

    Thanks in advance!

2 Replies

Avatar

Community Advisor

@AravindB1 : Not sure if you have noticed but this page takes lot of time to load as it is trying to gather information about various things. For instance, it look close to 4 minutes on a local instance with not so much of data (< 5GB in jcr:system).

You will have to wait until the entire page is done loading and time it will take to load completely is going to vary on instances (on subsequent requests, the time got reduced to approx 2 minutes).

Kamal_Kishor_0-1736433185315.png

 

Avatar

Community Advisor

Hi,

 

What status is being returned? I believe the issue is that you’re trying to access the Author instance, which requires authentication. If you directly hit the URL with localhost:4502, you'll be prompted for an authentication token.

 

You have 3 options:

  1. Provide an authentication token along with the request (in the authorization header). Ideally, if you are using AEMaaCS you can use this: https://experienceleague.adobe.com/en/docs/experience-manager-learn/getting-started-with-aem-headles... if not, you would need to use a simple authentication method, check here: https://sourcedcode.com/blog/aem/how-to-get-authorization-basic-auth-header-from-aem-author 
  2. Make that page public so that you can access this report from the publisher instance where you don't need to authenticate
  3. Find the source of this report's data and retrieve the data from its original source and not from the HTML using the Sling/JCR APIs.

 

Hope this helps!



Esteban Bustamante