Expand my Community achievements bar.

Guidelines for the Responsible Use of Generative AI in the Experience Cloud Community.
SOLVED

AEM takes significant high CPU and spikes when many requests are sent continously

Avatar

Level 2

Hi,

We've been experiencing performance issues on our AEM-backed websites for a while.

Recently, we've decided to investigate and found that, even on a fresh new vanilla AEM instance, the CPU usage still spikes when about 10-20 requests are sent continuously to any page on the sample content or the WKND project.

We've tried sending 20 requests per second for 10 seconds, which is equivalent to 200 users accessing the page in 10 seconds. This is not a high number. But the CPU usage still spikes to 100% and the server is inaccessible for more than 1 minute.

Is there any opinion to mitigate the performance issue?

The specs are as below:

  • Production-like Azure machine:
    • vCPUs 8
    • CPU Architecture x64
    • Memory 64 GiB
    • Hyper-V Generations V1, V2
    • Performance Score 151201
    • Processor AMD EPYC 7452 32-Core Processo
    • OS disk size 1023 GiB
    • Combined Write 128 MiB/Sec
    • Combined Read 128 MiB/Sec
    • OS Linux (RedHat 7.9)
  • Test machine:
    • 4 cores CPU Intel Core i7-8665U
    • Memory 32 GiB
    • OS Windows 11
  • Test machine:
    • 2 cores CPU
    • Memory 24 GiB
    • OS Linux (Fedora)

All of them experience the same CPU issue when running:

  • AEM service pack 6.5.14
  • With or without WKND 2.1.2
1 Accepted Solution

Avatar

Correct answer by
Employee Advisor

What is the result you would expect from this experiment? And do you think that this test is realistic for a production-like scenario?

 

Any AEM deployment comes on the publish side with a dedicated caching layer (dispatcher) which should be used as much as possible. That means that rendering all requests directly on AEM should be avoided.

View solution in original post

6 Replies

Avatar

Community Advisor

Hi @extwebd96701854 ,
Can you please check if your caching mechanism is working fine in you case, this shouldn't be the case if cached files are rendered.

Avatar

Community Advisor

@extwebd96701854 as @TarunKumar said you should check if the content is getting served from cache or it might be hitting the publisher server every time

Qucik check: how are you testing this? do you have the disaptcher on top of the publisher?

And i am suprised why this is happening on vanilla instance. If your cache mechanisam is working fine then raise a adobe ticket and reachout to Adobe support team

Avatar

Community Advisor

Hi @extwebd96701854 

 

High CPU utilization could be related to multiple issues. I would suggest you to analyze it from all the angles.

 

  • Check CPU utilization for any AEM process eating much memory. Collect thread dumps and analyze them.
  • Study the request.log file for detailed data.
  • Workflow checks, version checks, audit checks
  • Analyze caching strategies implemented at the AEM dispatcher
  • Check maintenance related to Oak repositories
  • Check indexing and review the jcr queries

 

Hope this helps.

Avatar

Correct answer by
Employee Advisor

What is the result you would expect from this experiment? And do you think that this test is realistic for a production-like scenario?

 

Any AEM deployment comes on the publish side with a dedicated caching layer (dispatcher) which should be used as much as possible. That means that rendering all requests directly on AEM should be avoided.

Avatar

Level 2

Hey @Jörg_Hoh 

What is the recommendation in the case where we do not have static content that can be cached, such as resources are rendered based on permissions of users accessing the solution. We do cache whatever resources is possible on both dispathcer and CDN (in our Production setup). But still experience this challenges during peak traffic

 

 

Avatar

Employee Advisor

In this case this is kind of expected; on the other hand side even in this scenario the rendering is not completely personalized; typically large parts of the results are identical (for example page navigation or other aspects).
You can make use of this and split the pages into static and dynamic parts, and just render the dynamics always on the publish; Sling Dynamics Include is often used for this.