Level 5

Question

Performance Degradation after AEM 6.5 LTS Upgrade

Forum|Forum|1 month ago
April 22, 2026
7 replies
238 views

In our Adobe Experience Manager internal portal(Azure OIDC auth 6.5 instance) the homepage invokes a few APIs that fetch data from a database and store user-specific responses in Redis.

We normally conduct performance testing with 18,000 concurrent users (2 publishers with ~9,000 virtual users each) over a duration of 30 minutes. Prior to the upgrade, all API responses were consistently under 1 second.

Recently, we upgraded to AEM 6.5 LTS (SP21) running on Java 21. After the upgrade, under the same load conditions, the average API response time has increased significantly to around 11 seconds.

There are no functional issues observed, and all APIs are working as expected. However, this performance degradation is only noticeable during load testing.

Has anyone encountered similar performance issues after upgrading to AEM 6.5 LTS SP21 or moving to Java 21? Any insights or areas to investigate would be helpful.

Experience Manager 6.5 LTS

Mani_kumar_

Community Advisor

As this is happening after Java 21 upgrade check for the JVM flags before and after.

also analyze the heap dump with load run and modify the GC params accordingly this may fix or atleast better the performance.

S

SubbaraoGa1

Adobe Employee

@akhilraj We do not currently see Adobe documentation indicating a known general regression in AEM 6.5 LTS + Java 21 that would by itself explain the increase from sub-second responses to ~11 seconds under load. Adobe does, however, recommend explicit performance validation after upgrade and notes that Java 17/21 GC settings should be tuned for optimal performance rather than relying on defaults. Based on the current behavior, this appears more consistent with a load-sensitive tuning or integration issue involving JVM behavior, custom code, DB/Redis concurrency, or request/query execution under high load. [1] [2] [3]

[1] https://experienceleague.adobe.com/en/docs/experience-manager-65-lts/content/release-notes/service-pack/ga

[2] https://experienceleague.adobe.com/en/docs/experience-manager-65-lts/content/implementing/deploying/practices/best-practices-for-performance-testing

[3] https://experienceleague.adobe.com/en/docs/experience-manager-65-lts/content/implementing/deploying/configuring/configuring-performance

akhilrajAuthor

Level 5

Our platform team tried all options, but couldnt figure out what is blocking the api calls, and increasing the response time. We are not observing any error n publisher and dispatcher, so couldnt identify the rootcause as well.

S

SubbaraoGa1

Adobe Employee

@akhilraj

At present, since the APIs are functionally working and no explicit errors are being recorded in Publisher or Dispatcher logs, the issue appears to be a load-related performance bottleneck rather than a functional application error. In these situations, Adobe recommends collecting runtime diagnostics during the performance event—including thread dumps, GC logs, system metrics, request timing, and backend dependency metrics—because latency under high concurrency may be caused by thread contention, JVM behavior, connection-pool waits, or downstream service delays, none of which necessarily produce visible application errors. Once these diagnostics are captured during the affected load window, you can be correlated to identify the actual blocking point and root cause.

Sources:

akhilrajAuthor

Level 5

We have raised one Adobe ticket also and provided all the required logs as well.

Waiting for their response.

In the meantime, we have updated pool size in below configs as well, but couldnt help

Apache Sling Thread pool Configuration
Apache Sling Job Thread Pool
Apache Felix Jetty Based HTTP Service

chaudharynick

Level 4

Hi @akhilraj

Java 21 uses a completely modern, NIO-based underlying Socket implementation. While it is more efficient, it exposes lock contention issues in older third-party connection pools under high load.

The Redis/DB Clients: If you are using Jedis (which relies on Apache Commons Pool), Lettuce, or HikariCP for your database, and you did not update these dependencies during the Java 21 migration, the older pooling mechanisms can suffer from severe synchronized block contention.
SSL/TLS Handshakes: If your connections to the Azure Database or Redis are over SSL/TLS, Java 21 introduces stricter security defaults. If connection pooling isn't configured correctly and connections are being rapidly recreated instead of reused, the CPU overhead of Java 21's TLS handshakes at 18,000 concurrent requests will cause an immediate bottleneck.
Action: Verify that your Redis client and JDBC drivers are updated to their latest, Java 21-certified versions.

A

akhil_merupula

Level 4

Hi @akhilraj ,

We hit something similar after a Java upgrade on an AEM 6.5 instance, not identical but close enough that some of these should be worth checking.
The jump from under 1 second to 11 seconds under the same load is a pretty classic sign of thread contention or connection pool exhaustion rather than raw compute performance. Java 21 handles virtual threads differently and some of the underlying Jetty and Felix HTTP settings that worked fine on older JVMs can become bottlenecks under high concurrency when you upgrade.
First thing I’d check is your Sling thread pool configuration in the OSGi console. Specifically the minimum and maximum thread counts for the default thread pool. These defaults are often too conservative for 18,000 concurrent users and don’t automatically scale with the JVM upgrade. Under heavy load you’ll see requests queuing behind available threads which shows up as that kind of dramatic latency spike.
Second thing is your Redis connection pool settings. If your APIs are fetching from database and caching in Redis on every homepage load, under 18k concurrent users you can saturate the Redis connection pool fast. Check your Jedis or Lettuce pool configuration maxTotal, maxIdle, and the timeout values. If connections are being borrowed and not returned fast enough you’ll see exactly this kind of latency pattern where everything works functionally but response times blow up under load.
Third is the Azure OIDC token validation piece. If every API request is going through OIDC token validation on the AEM side, that validation call can become a bottleneck under load if it’s not being cached. After an upgrade it’s worth checking whether your token cache is still wired correctly or if it got reset to defaults.
For Java 21 specifically, check if G1GC is still your garbage collector and look at GC pause times in your logs during the load test window. Sometimes a JVM upgrade shifts GC behavior in ways that show up only under sustained load.
Would be helpful to know if the degradation is consistent throughout the 30 minutes or if it starts fine and degrades over time that would point more toward memory pressure or connection pool exhaustion vs a configuration issue that’s constant from the start.

bhimanandak6148

Level 2

I am from Akhil’s team. We are noticing issue with only high load, the same issue we did observed in AEM 6.5 SP21. After upgrading (AEM 6.5 LTS SP2), All daily & weekly maintenance jobs completed successfully. application also working fine for normal load. however for API calls. we are noticed higher response for AEM 6.5 LTS SP2. ( Example : Authentication was taking 1 second with high load earlier now, its taking around 8 to 11 seconds)

We are using below Sling thread pool configuration in the OSGi console and Oracle Java 21 parameters.

quickstart

# Base JVM options - optimized for AEM 6.5 LTS
CQ_JVM_OPTS='-server -Xms16g -Xmx16g -XX:+UseG1GC -XX:MaxGCPauseMillis=150 -XX:+UseStringDeduplication -XX:InitiatingHeapOccupancyPercent=65 -XX:G1HeapRegionSize=16m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data/adobe-aem6.3/publish/crx-quickstart/logs/heapdump -XX:ErrorFile=/data/adobe-aem6.3/publish/crx-quickstart/logs/hs_err_pid%p.log -Xlog:gc*:/data/adobe-aem6.3/publish/crx-quickstart/logs/gc.log:time,uptime,level,tags:filecount=5,filesize=50M -Djava.io.tmpdir=/data/tmp -Djdk.xml.entityExpansionLimit=25000 -Djavax.net.ssl.trustStore=/usr/java/latest/lib/security/cacerts -Dsun.zip.disableMemoryMapping=true -Djava.awt.headless=true -Djava.security.egd=file:/dev/./urandom -Doak.fastQuerySize=true -Doak.queryLimitInMemory=100000 -Doak.queryLimitReads=1000000 -Dupdate.limit=100000 -Doak.queryFailTraversal=true -Djdk.util.zip.disableZip64ExtraFieldValidation=true'

# Essential module opens for AEM 6.5 LTS
JPMS_OPTS="--add-opens=java.base/java.lang=ALL-UNNAMED"
JPMS_OPTS="${JPMS_OPTS} --add-opens=java.base/java.io=ALL-UNNAMED"
JPMS_OPTS="${JPMS_OPTS} --add-opens=java.base/java.net=ALL-UNNAMED"
JPMS_OPTS="${JPMS_OPTS} --add-opens=java.base/java.util=ALL-UNNAMED"
JPMS_OPTS="${JPMS_OPTS} --add-opens=java.base/jdk.internal.loader=ALL-UNNAMED"
JPMS_OPTS="${JPMS_OPTS} --add-opens=java.base/sun.net.www.protocol.jrt=ALL-UNNAMED"
JPMS_OPTS="${JPMS_OPTS} --add-opens=java.desktop/java.awt.image=ALL-UNNAMED"
JPMS_OPTS="${JPMS_OPTS} --add-opens=java.desktop/com.sun.imageio.plugins.jpeg=ALL-UNNAMED"
JPMS_OPTS="${JPMS_OPTS} --add-opens=java.naming/javax.naming=ALL-UNNAMED"
JPMS_OPTS="${JPMS_OPTS} --add-opens=java.naming/javax.naming.spi=ALL-UNNAMED"
JPMS_OPTS="${JPMS_OPTS} --add-opens=java.xml/com.sun.org.apache.xerces.internal.dom=ALL-UNNAMED"
JPMS_OPTS="${JPMS_OPTS} --add-exports=java.xml/com.sun.org.apache.xml.internal.serialize=ALL-UNNAMED"

When we start the Performance test, during pull load, we notice as locking for Event dispatcher.

Regards
Bhima

A

akhil_merupula

Level 4

Thanks for sharing the additional context @bhimanandak6148 the fact that someone from your team confirmed this persists in AEM 6.5 LTS SP2 as well is really useful information for others hitting this issue.
Looking at your JVM config and the thread pool screenshots together, a few things stand out.
Your heap is set to 16g with G1GC and MaxGCPauseMillis at 150ms which is reasonable, but with Java 21 and 18,000 concurrent users the GC pause target can become a bottleneck in a different way. G1GC in Java 21 behaves differently under sustained concurrency compared to Java 11 specifically around region allocation and concurrent marking. If your load test shows the degradation is consistent from the start rather than creeping in over time, I’d look at whether GC pauses are spiking during the load window even briefly, because 150ms pauses that weren’t visible under Java 11 can cascade badly at 18k concurrent requests.
The locking on Event Dispatcher that Bhima mentioned from the profiler screenshot is the most interesting signal here. That specific contention point in AEM’s publisher typically shows up when there’s a high volume of observation events being fired and not consumed fast enough and this can get significantly worse with Java 21’s NIO changes as @chaudharynick mentioned, because the socket layer now surfaces contention that was previously hidden.
One thing worth checking that nobody has mentioned yet look at your org.apache.sling.event configuration specifically the queue.threadPoolSize for the main event queue. Under high load with Java 21, the default event queue thread pool can become undersized even if the Sling Job Thread Pool looks fine, because they’re separate pools. The authentication delay Bhima mentioned 1 second going to 8-11 seconds is very consistent with event queue saturation on the authentication path specifically.
Would be helpful to know what the thread dump shows during the load window. If you see a large number of threads blocked waiting on the event dispatcher that confirms this direction.

akhilrajAuthor

Level 5

Hi @akhil_merupula : We dont have any cache set for OIDC token. Also we upgraded Jedis client and JDBC driver. Still not seeing any improvement.

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded