Hi there,
we are experiencing significant issues within our AEM infrastructure caused by massive content updates together with parallel dispatcher cache invalidation. Cache invalidation is driven by agents on the publish instances. One of the symptoms is the publish queues getting stuck, probably due to heavy load on the publishers which have to handle activation and rendering at the same time.
Currently, any content update being pushed from author to publisher invalidates the complete dispatcher cache which (beside the heavy load) also leads to a poor cache hit ratio.
The problem seems to get exacerbated further by the fact that we have multiple publishers and dispatchers in a many-to-many (spaghetti) relationships which seems to lead to "invalidation storms" as content getting replicated to multiple publishers lead to repeated invalidation on the dispatchers.
It would be helpful if you could provide some pointers whether such problems are common and should be solved either via
Thx in advance,
Nick
Solved! Go to Solution.
Views
Replies
Total Likes
I will create an article by tomorrow on batch replication & update you. To flus partially see if below link helps.
http://adobe-consulting-services.github.io/acs-aem-commons/features/dispatcher-flush-ui.html
http://adobe-consulting-services.github.io/acs-aem-commons/features/dispatcher-flush-rules.html
Views
Replies
Total Likes
Here is a Dispatcher document that may help:
http://helpx.adobe.com/experience-manager/using/dispatcher-faq.html
2 "Are there any content modelling guidelines" -- it is recommended that you read the following: http://dev.day.com/docs/en/cq/current/howto/model_data.html.
3 With respect to workflows (deployment workflows) -- there are lots of AEM topics that i recommend that you read:
[img]WOrkflows.png[/img]
You can find these Workflow topics here:
Views
Replies
Total Likes
Have you tried with batch replication & cache refill (tip 2 at [1]) before going custom way?
[1] http://my.adobeconnect.com/p7th2gf8k43/
Views
Replies
Total Likes
Thanks for the comments, I did read the content modelling guidelines but they seem to apply mostly to semi structured data (e.g. blog posts etc.) rather than individual html pages in a traditional web site.
From what I can read, html content is considered complex and therefore from a cache flushing perspective, auto invalidation is recommended rather than content update (as we probably don't know which resources actually need to be replaced/deleted.
Question 1: Are there any design recommendations for html resources so that we could at least use multi level stat files, thereby isolating invalidation at a individual subtree level ? This is about the breadth of the invalidation, how can this be reduced ?
The other issue that I can't find any related comments is what we call "invalidation storm". Because we have multiple dispatchers and publishers, connected in a many-to-many fashion for load balancing reasons, once we replicate and activate resources into all the publishers, every single publisher in turn sends invalidation requests (invalidating the whole site !!) to all dispatchers. That means that updating a single page invalidates and thereby re renders that single page 5 times (because we have 5 publishers)
Question 2: How can we ensure that for every change/activation event (i.e. one resource has been changed), the dispatcher cache is invalidated only once ?
thx a lot, Nick
Views
Replies
Total Likes
Thanks for the comments, and the video is certainly valuable.
I am assuming, you are referring to using a re-fetch agent which would then help to reduce the simultaneous requests hitting the publisher. This is certainly something that we will be looking at, I do doubt though where this will fix the key issue.
The reason for my statement is that this overload directly after invalidation shouldn't last very long. As soon as the first simultaneous request successfully returns (let's say after 3 secs), the rendered page gets cached and all subsequent calls will be served the cache page.
Our key issue is more related to how to limit the amount of invalidation (if a label changes, the complete site gets invalidated currently) as well as the amount of invalidation frequency.
I couldn't find the batch replication aspect that you mentioned above. Can you be more specific what you are referring to ?
thx in advance
Nick
Views
Replies
Total Likes
I will create an article by tomorrow on batch replication & update you. To flus partially see if below link helps.
http://adobe-consulting-services.github.io/acs-aem-commons/features/dispatcher-flush-ui.html
http://adobe-consulting-services.github.io/acs-aem-commons/features/dispatcher-flush-rules.html
Views
Replies
Total Likes