While working on my translation project in AEM 6.3 , I use the references panel to update/promote language copies to other languages from EN and to Synchronize/Roll out the language copies to live copies , however for all the system users the References Panel does not load properly and throws an timeout error, however if I login with my administrator credentials , it seems to work with much speed.
I am not sure if there is any difference in the response time on events for different user levels in AEM , particularly for queries like loading all References of pages in the JCR.
I have attached the screenshot of the error .
when we checked the browser debug tool , it shows this
also the link to that URL in network Tab.
Please help me figure out this issue also suggest some way to improve the performance of the tool .
Is this related to slow performance or with timeout/504 that happens intermittently?
If you suspect that admin users perform better than others, I would recommend to log the queries and related information for this use case for further debugging -
Enable logging for com.day.cq.search and your code package - This would provide the query and time taken to execute the query. Compare the behavior & data for admin/non-admin users.
Check what queries run and what is the response time, what indexes are applied to the query and is there a need to optimize the indexes? Does this use case run inappropriate queries that are not supposed to run? Compare it for both admin & non-admin users.
Check the component timings on author in developer mode. Does this happen on publish server as well across environments or only a single environment?
Check what kind of restrictions authors have on assets/referenced assets? Do you have a custom authorization or something related in-place that might be a potential bottleneck? Does your project use any specific customization for ACLs? Are you on latest service pack/CFP for the AEM version?
Based on the setup check server logs - AEM logs, webserver logs, dispatcher logs etc. for warnings/errors that are related to this use case.
Check the behavior of author standalone vs author with dispatcher vs author with dispatcher & LB and other proxies. As you've mentioned that bypassing dispatcher works fine, then check the dispatcher configurations and experiment with different configurations like increasing the timeout value. Open dispatcher.any file and change the value of /receiveTimeout to 0 for testing
Isolate the application issues vs network issues vs hardware issues - you suspect issues with ACL within AEM but on the other side you've mentioned that bypassing dispatcher works fine which means that the root cause is probably outside AEM. It would be better to work with a debugging strategy for a quick turnaround.
I'm not sure what kind of data/information you can share in this thread. It would be difficult for anyone to comment on any specific functionality unless we've specific data points. Alternatively, you could open a ticket with customer care with your findings.
Dispatcher.any file renderer section -
# Hostname or IP of the render
# Port of the render
# Connect timeout in milliseconds, 0 to wait indefinitely /timeout "5000"
For anyone experiencing this specific problem (very slow loading, or timing-out References panel in the Sites Touch UI) here is what it turned out to be the culprit for us;
Stale 'launches' were building up in /content/launches/ due to our workflow which creates them as a part of doing regular MSM content rollouts. The vast quantity of nodes under launches (hundreds of thousands of nodes in our case) took well over a minute to process before the References panel could be displayed. For whatever reason, raw Admin account access allowed this process to complete more quickly, presumably due to simpler security settings for that account. Accessing the server directly by IP address or machine name allowed us to bypass the dispatcher timeouts which also helped, but ultimately what was necessary was deleting the old, stale launches that had already served their purpose.
The following solution is destructive; it assumes you can safely delete some of your mountain of old launches without losing any important information. While I can think of relatively few good reasons to keep all your historical launches for posterity, I am a CMS guy but I am not your CMS guy, I do not know your data retention policy, your workflow, or your architecture. It is your responsibility to know whether this is a safe and prudent approach for you to take.
We had too many launches to display in the Sites Touch UI interface; the JSON request either timed out before the back end could reply with data so the list of launches never populated on the page, or it populated but the delete button never appeared, or it appeared but timed out trying to delete them. That approach was a bust.
While I suspect these could be deleted individually with the /crx/de/ interface, that would be slow and laborious.
From there select Content Explorer which will pop up a new window (check your popup blockers) with the explorer interface.
Navigate to /content/launches/ and you should see a hierarchy of /year/month/day/launchName. If your problem resembles our problem, expect to see many many launches when you drill down.
Now you can choose to delete the launches a day at a time, or a month at a time, I suggest starting from the oldest and ending a few months back from current, just in case those more recent launches have useful info in them. Just right click the node in the left hand pane, select 'Delete Recursive' and follow the prompts. In Firefox on the Mac at least, the windows pop open a bit too small and will need to be resized to see all the buttons in the recursive delete dialog.
Once you've purged a few tens (or hundreds) of thousands of old launches, you should find the references panel is a great deal more responsive for everyone.
I am logged in with my office VPN. I see that most of our users work in the same network and face the same problem. However I have seen a significant improvement in the performance ( the reference load functionality) with admin users as compared to normal (author) users.
Does this have something to do with the access levels to files ?
Also whenever we try accessing the servers with IP address bypassing the dispatchers it works pretty fine consistently.
The screenshot shows 504 Gateway timeout error which means that the upstream server couldn't get timely response because of various reasons and got timed out. Your local machine's browser could not get a timely response and the server that your browser is connected to threw off the connection. Are you on a VM or a cloud server?
Get the network connection checked and traffic monitored for a period of time whenever this issue happens. Engage your IT-Network team to intercept the traffic via tools.
In addition to that, check the timeout on dispatcher and keep-alive on web server and other proxy servers that are a part of this network.
There shouldn't be any difference in the response times for normal user vs admin user.
This issue might be visible only on slow queries that run over the timeout limit specified in one of intermediary servers in the network.