Adobe Experience Manager Sites & More

monohusche · 10/15/15

Hi there,

we are experiencing significant issues within our AEM infrastructure caused by massive content updates together with parallel dispatcher cache invalidation. Cache invalidation is driven by agents on the publish instances. One of the symptoms is the publish queues getting stuck, probably due to heavy load on the publishers which have to handle activation and rendering at the same time.

Currently, any content update being pushed from author to publisher invalidates the complete dispatcher cache which (beside the heavy load) also leads to a poor cache hit ratio.

The problem seems to get exacerbated further by the fact that we have multiple publishers and dispatchers in a many-to-many (spaghetti) relationships which seems to lead to "invalidation storms" as content getting replicated to multiple publishers lead to repeated invalidation on the dispatchers.

It would be helpful if you could provide some pointers whether such problems are common and should be solved either via

infra remediation (physical partitioning of CQ5 infrastructures in website specific farms)
1. seems to be quite the opposite of "scale out" and modern cloud architectures
content model changes
1. using folder level (partial) invalidation (multiple stat files) which would at least reduce the amount of invalidated assets
2. Are there any content modelling guidelines which allow for better management of asset dependencies (i.e. not requiring the invalidation of the whole website if a label is changed)
custom invalidation/deployment workflows
1. A custom workflow runs on the author which after publishing to the activation queues, monitors these queues and once the queue is empty again, invalidation requests are sent to the individual dispatchers. This would remove race conditions (activation vs cache invalidation) as well as avoid invalidating the cache multiple times for a single update.

Thx in advance,

Nick

Sham_HC · 10/15/15

I will create an article by tomorrow on batch replication & update you. To flus partially see if below link helps.

http://adobe-consulting-services.github.io/acs-aem-commons/features/dispatcher-flush-ui.html
http://adobe-consulting-services.github.io/acs-aem-commons/features/dispatcher-flush-rules.html

View solution in original post

smacdonald2008 · 10/15/15

Here is a Dispatcher document that may help:

http://helpx.adobe.com/experience-manager/using/dispatcher-faq.html

2 "Are there any content modelling guidelines" -- it is recommended that you read the following: http://dev.day.com/docs/en/cq/current/howto/model_data.html.

3 With respect to workflows (deployment workflows) -- there are lots of AEM topics that i recommend that you read:

[img]WOrkflows.png[/img]

You can find these Workflow topics here:

http://dev.day.com/docs/en/cq/current.html

Sham_HC · 10/15/15

Have you tried with batch replication & cache refill (tip 2 at [1]) before going custom way?

[1] http://my.adobeconnect.com/p7th2gf8k43/

https://github.com/cqsupport/webinar-dispatchercache

monohusche · 10/15/15

Thanks for the comments, I did read the content modelling guidelines but they seem to apply mostly to semi structured data (e.g. blog posts etc.) rather than individual html pages in a traditional web site.

From what I can read, html content is considered complex and therefore from a cache flushing perspective, auto invalidation is recommended rather than content update (as we probably don't know which resources actually need to be replaced/deleted.

Question 1: Are there any design recommendations for html resources so that we could at least use multi level stat files, thereby isolating invalidation at a individual subtree level ? This is about the breadth of the invalidation, how can this be reduced ?

The other issue that I can't find any related comments is what we call "invalidation storm". Because we have multiple dispatchers and publishers, connected in a many-to-many fashion for load balancing reasons, once we replicate and activate resources into all the publishers, every single publisher in turn sends invalidation requests (invalidating the whole site !!) to all dispatchers. That means that updating a single page invalidates and thereby re renders that single page 5 times (because we have 5 publishers)

Question 2: How can we ensure that for every change/activation event (i.e. one resource has been changed), the dispatcher cache is invalidated only once ?

thx a lot, Nick

monohusche · 10/15/15

Thanks for the comments, and the video is certainly valuable.

I am assuming, you are referring to using a re-fetch agent which would then help to reduce the simultaneous requests hitting the publisher. This is certainly something that we will be looking at, I do doubt though where this will fix the key issue.

The reason for my statement is that this overload directly after invalidation shouldn't last very long. As soon as the first simultaneous request successfully returns (let's say after 3 secs), the rendered page gets cached and all subsequent calls will be served the cache page.

Our key issue is more related to how to limit the amount of invalidation (if a label changes, the complete site gets invalidated currently) as well as the amount of invalidation frequency.

I couldn't find the batch replication aspect that you mentioned above. Can you be more specific what you are referring to ?

thx in advance

Nick