Hi All,
I have the below scenario occurring for the STAGE environment -
The response time & its caching action from dispatcher for home page requests varies greatly.
From the AEM dispatcher logs we have the below info -
[29/Nov/2022:01:31:49] [I] "GET /content/brandA/us/en/home.html" -7728ms [publishfarm/0] [actionmiss] stage.brandA.com
[29/Nov/2022:02:16:19] [I] "GET /content/brandA/us/en/home.html" -5857ms [publishfarm/0] [actionmiss] stage.brandA.com
[29/Nov/2022:02:16:23] [I] "GET /content/brandA/us/en/home.html" [publishfarm/0] [actionnone] stage.brandA.com
[29/Nov/2022:02:16:56] [I] "GET /content/brandA/us/en/home.html" -3ms [publishfarm/-] [actionhit] stage.brandA.com
[29/Nov/2022:02:28:02] [I] "GET /content/brandA/us/en/home.html" -3ms [publishfarm/-] [actionhit] stage.brandA.com
[29/Nov/2022:02:31:26] [I] "GET /content/brandA/us/en/home.html" -1886ms [publishfarm/0] [actionmiss] stage.brandA.com
[29/Nov/2022:05:05:41] [I] "GET /content/brandA/us/en/home.html" -5ms [publishfarm/-] [actionhit] stage.brandA.com
[29/Nov/2022:06:32:48] [I] "GET /content/brandA/us/en/home.html" -3ms [publishfarm/-] [actionhit] stage.brandA.com
[29/Nov/2022:07:35:17] [I] "GET /content/brandA/us/en/home.html" 2002280ms [publishfarm/0] [actionnone] stage.brandA.com
As you can see for the same domain the requests are getting served in 3 ways-
1. [actionhit] indicating 4/9 requests were cached are being served in 3-5ms by publishfarm/-
2. [actionmiss] indicating 3/9 requests were not cached taking varying time - 1.88s, 5.8s, 7.7s by publishfarm/0
3. [actionone] indicating 2/9 requests were not to be cached and took more than 2000s / 30m to be served by publishfarm/0.
Any help on why such a varying response time for the same request URL and for same domain stage.brandA.com ?
@Mohit_KBansal, @Jörg_Hoh, @arunpatidar, @B_Sravan, @kautuk_sahni
Thanks,
Rohan Garg
Solved! Go to Solution.
Topics help categorize Community content and increase your ability to discover relevant content.
@arunpatidar - The issue seems to be similar to the one with /favicon requests.
The STAGE environment has higher % of long processing time requests but even that time might not be accurate as seen with /favicon.
After our go live yesterday, I pulled in the logs for homepage requests and came up with below observations -
#1. On PROD, The slow running requests are 40 out of total 1372 which accounts for like 3%. All these requests have [actionnone] and are showing response time of +2000000ms or 33.3 min.
These requests have one similarity - All these requests are followed by really long query parameter strings which are most likely associated with couple of Google Tag Manager Scripts, Pixel Scripts and others which are embedded on our HTML.
?fbclid=, ?mm_campaign=,
Only one request with simple query parameter had a long processing time -
GET /content/brandsA/us/en/home.html?kdhfjdf" 2001428ms [publishfarm/0] [actionnone] brandsA.com
This again might not be 100% accurate as seen in /favicon requests even though icon loads in seconds, the request gets logged as taking long time.
#2. The other 1332 requests, are classified as below -
1317 requests have [actionhit] and response time < 5 ms - That's 98.9%.
#3. 14 requests have [actionmiss] and response time in range 1.4s to 2.8s which should be acceptable I think.
Hence, I am marking this issue as closed for now. The number of requests which are slow is low and all seem to be associated with 3rd party tracking scripts which are expected to increase the page load time. Also, the 33.3m stat might not be the most reliable factor as there are instances when request got served in seconds but still got logged for 33.3m.
Thank you for you help @arunpatidar !
Hi,
I would say, check the thread dumps and error logs to check why the publish instance is slow around 7:35. Also check the other requests respond time during same time.
You can ask Adobe to check on new relic if there were load issue.
Hi Arun, thanks for the quick response. It's always highly appreciated.
If you check the other query thread too, this is common to multiple requests such as /favicon.ico and other website page requests that the website response time is varying greatly.
As for the error logs, we have already validated that there are no errors, but only warnings in the system - I will try to find a pattern for any particular warning that coincides with the higher response time.
But as seen in /favicon.ico requests, the ratio of slow response time requests to fast ones is almost 1:6 maybe, so it does not seem to be a time specific issue.
Let me know your thoughts on this.
Update -
@arunpatidar - I could find the below warnings occurring at the same time as the heavy processing requests -
[29/Nov/2022:02:16:23] [I] "GET /content/brandA/us/en/home.html" [publishfarm/0] [actionnone] stage.brandA.com
[29/Nov/2022:07:35:17] [I] "GET /content/brandA/us/en/home.html" 2002280ms [publishfarm/0] [actionnone] stage.brandA.com
1. Time - 02:16:23
29.11.2022 02:16:19.909 *INFO* [GET /content/brandA/us/en/home.html HTTP/1.1] com.day.cq.wcm.core.impl.designer.SystemDesign Initialized system design at /libs/settings/wcm/designs/default in 25ms
29.11.2022 02:16:19.956 *WARN* [GET /content/brandA/us/en/home.html HTTP/1.1] com.adobe.granite.repository.impl.SystemPrincipalsValidation Refactor principal 'custom-system-user' to have principal-based access control setup.
29.11.2022 02:16:23.366 *WARN* [qtp444312771-272] com.day.crx.sling.server.impl.jmx.SecureContentRepositoryAccess Reference to 'org.apache.jackrabbit.oak.spi.state.NodeStore' Service(s) from bundle 'com.adobe.granite.license.http' is not allowed.
29.11.2022 02:16:23.602 *INFO* [GET /content/brandA/us/en/home.html HTTP/1.1] com.day.cq.wcm.core.impl.components.ComponentCacheImpl Loaded 751 components in 200ms
29.11.2022 02:16:23.668 *INFO* [GET /content/brandA/us/en/home.html HTTP/1.1] com.day.cq.wcm.core.impl.designer.SystemDesign Initialized system design at /libs/settings/wcm/designs/default in 16ms
29.11.2022 02:16:23.711 *WARN* [GET /content/brandA/us/en/home.html HTTP/1.1] com.adobe.granite.repository.impl.SystemPrincipalsValidation Refactor principal 'custom-media-system-user' to have principal-based access control setup.
29.11.2022 02:16:24.349 *INFO* [GET /content/brandA/us/en/home.html HTTP/1.1] org.apache.sling.xss.impl.HtmlToHtmlContentContext AntiSamy warning: The br tag contained an attribute that we could not process. The _rte_temp_br attribute has been filtered out, but the tag is still in place. The value of the attribute was "brEOB".
2. 07:35:17
29.11.2022 07:35:17.261 *WARN* [qtp337850597-86] com.day.crx.sling.server.impl.jmx.SecureContentRepositoryAccess Reference to 'org.apache.jackrabbit.oak.spi.state.NodeStore' Service(s) from bundle 'com.adobe.granite.license.http' is not allowed.
29.11.2022 07:35:18.878 *INFO* [GET /content/bdl/us/en/home.html HTTP/1.1] org.apache.sling.xss.impl.HtmlToHtmlContentContext AntiSamy warning: The br tag contained an attribute that we could not process. The _rte_temp_br attribute has been filtered out, but the tag is still in place. The value of the attribute was "brEOB".
Both of these have the below common Info log -
org.apache.sling.xss.impl.HtmlToHtmlContentContext AntiSamy warning: The br tag contained an attribute that we could not process.The _rte_temp_br attribute has been filtered out, but the tag is still in place. The value of the attribute was "brEOB".
Do you think this can be the responsible for such a heavy drag on the request processing time ?
I am not sure, if this is responsible
Try to get request log and rlog.jar utility in local, to see which are the time consuming request.
@arunpatidar - The issue seems to be similar to the one with /favicon requests.
The STAGE environment has higher % of long processing time requests but even that time might not be accurate as seen with /favicon.
After our go live yesterday, I pulled in the logs for homepage requests and came up with below observations -
#1. On PROD, The slow running requests are 40 out of total 1372 which accounts for like 3%. All these requests have [actionnone] and are showing response time of +2000000ms or 33.3 min.
These requests have one similarity - All these requests are followed by really long query parameter strings which are most likely associated with couple of Google Tag Manager Scripts, Pixel Scripts and others which are embedded on our HTML.
?fbclid=, ?mm_campaign=,
Only one request with simple query parameter had a long processing time -
GET /content/brandsA/us/en/home.html?kdhfjdf" 2001428ms [publishfarm/0] [actionnone] brandsA.com
This again might not be 100% accurate as seen in /favicon requests even though icon loads in seconds, the request gets logged as taking long time.
#2. The other 1332 requests, are classified as below -
1317 requests have [actionhit] and response time < 5 ms - That's 98.9%.
#3. 14 requests have [actionmiss] and response time in range 1.4s to 2.8s which should be acceptable I think.
Hence, I am marking this issue as closed for now. The number of requests which are slow is low and all seem to be associated with 3rd party tracking scripts which are expected to increase the page load time. Also, the 33.3m stat might not be the most reliable factor as there are instances when request got served in seconds but still got logged for 33.3m.
Thank you for you help @arunpatidar !
There could be many reasons for that, although 30min rather point to a more general problem. What about the performance of the other requests on the same dispatcher/AEM publish, were they slow in the same way?
Hi @Jörg_Hoh - Yes, there is one other request which is behaving this way (/favicon based).
Also, another interesting observation is when we hit the direct request from publish or even dispatcher, the icon takes 1-2 seconds to load but the request processing time is still shown as +30m.
I think the stat for request processing time might be inaccurate but since we validated on local that the page is getting cached, it need not be a concern from performance point of view I think.
Views
Likes
Replies