Hello All,
Would like to share you about the performance challenge faced by one AEM customer w.r.t AEM XML Documentation and the solution which fixed the problem.
Customer use AEM XML Documentation solution in AEM 6.3.0.2.
They upload Zip of HTML files which would be processed by OOTB DITA Workflows and custom workflows to create new set of content for their product(s).
Problem faced:
- Inconsistent performance was observed while processing the input HTML data.
- For the same input data, it takes sometimes X min and sometimes Y min.
- Consistent longer processing time irrespective of size of input data.
- Processing time always takes more than 15 mins.
- Slowness is observed across
- Async processing step of validating file for DTD compliance.
- Automated workflow step of creating preview page, replicating the preview page.
Client Environment:
AEM Version | 6.3.0.2 |
Cores | 8 Cores |
RAM | 64 GB |
XML Add-on | 3.4 |
Spliting the process:
To find the rootcause , we first analyzed the distinct steps involved in the whole process.Following were the distinct steps involved
1. Completion of Preset availability
2. Completion of Preview page correction
3. Activation of preview page/ assets
4. Update of product pages
On extensive testing, it was observed that considerable time was spent on Preview page Correction & Activation of preview page/assets steps.
Thread dump analysis
To further investigate the issue, we collected the thread dump shared by customer.Thread dump covered the period of 20 min during which the whole processing occurred.So, we had 20 thread dumps. Each of these dump file contains 10 Jstack files to represent the state of thread at every 4 secs.
So, overall we had 200 thread dump snapshots to analyze.
We analyzed each of these 200 snapshots, looked for those threads which are found to run frequently.
Found that 'async -index-update' thread was trigggered multiple times and found to be part of almost all thread dumps.
And this indexing was found to be taking almost 4 secs.
Also observed that such indexing were getting triggered for around 5 such custom oak indexes due to significant change happening in nodes pertained to some of folders which involves upload / processing.
Based on these observation, we suggested to customer to exclude some of the folders which involves extensive content creation(like /content/dam/htmlfiles) at the definitions of the aforementioned 5 indexes.
We suggested this, as it is confirmed that there would be no impact to functionality, by excluding such folders.
Once done, we see that performance increased massively. File which used to take 25 min for processing takes just 2-3 mins for completion.
Hope this blogs helps!
Regards,
Venkatesh