Let me first describe our 2 agents on our author environment:
- Dispatcher flush agent (we have a dispatcher in front of our author environment to add an extra layer of caching for the content authors)
- Default agent (used for publishing to the publish environment)
NOTE: obviously 2 different entities.
The dispatcher flush agent is new in our setup and gets trigger on modification. After adding the agent our content author complained that their changes were automatically published for some reason. After some investigation/debugging of the com.day.cq.replication.impl.ReplicatorImpl it seems that there is also a replication event fired by the RolloutManager with the following ReplicationOptions (note: no AgentIdFilter or ReplicateOnModification filter)
ReplicationOptions{synchronous=false, revision='null', suppressStatusUpdate=false, suppressVersions=false, filter=null, aggregateHandler=null}
No filter results in all enabled agents to be triggered so the path will get replicated to the publishers, NOT wat we want...
It seems to me the the combination of a dispatcher flush agent and default agent is not working properly. Anyone experiencing the same problems?
Views
Replies
Total Likes
Since the flush agent is on author, could you validate that the "Triggers" tab of the flush agent doesn't have any checkbox turned on esp. Ignore Default or On Modification?
Are you doing programmatic replication/flushing or any workflow based replication/flushing of the content using ReplicationOptions?
This is our triggers tab:
And no workflows are doing such a thing in our codebase at the moment. Wouldn't I see our custom class if we had one when debugging?
In any case, the RolloutManager would always fire a replication event if it's configured in that way; it does not depend on the fact if a flush agent is configured or not (or even a combination of standard replication agent and flush agent).
I assume that it's just a coincidence of these 2 activities, and that the flush agent has nothing to do with it. Have you changed your rollout configuration recently to include replication?
Jörg
I am testing it on a local machine with a default and dispatcher flush agent pointing to our acceptance env. so it would can't be a coincidence because I see it every time and I am the only one testing on my local machine. We have the following rollout configs on our livecopy:
- /etc/msm/rolloutconfigs/pushonmodify
- /etc/msm/rolloutconfigs/activate
- /etc/msm/rolloutconfigs/deactivate
We had the above config from the start of the project and never had any pages publish automatically on edit. We only want the dispatcher flush on edit for now.
Ah, that's a tough one then. But if you can replicate it, it's getting easier :-)
So you think that the invalidation agent is also triggering the default replication agent, which is the publishing the changed page?
Something like that is happening I think. Not sure how to fix it.
How is your "invalidation" agent configured on author? You should have these checkboxes ticked on the "Triggers" tab:
* Ignore default
* On Modification
* No Status Update
* No Versioning
This should prevent a number of unwanted side effects on regular processes (especially "no status update" and "no versioning"). On the other hand, every invalidation request will also trigger an OSGI event about a successful replication event and I don't think that this can be suppressed.
On the other hand side I am not aware of any side effect of such an invalidation agent, it will definitely not trigger any other replication agent.
The problem you face might be caused indirectly by this, does your code react on these OSGI events?
Jörg
I've updated the trigger for the invalidation agent but it still seems to trigger a replication for some reason. If it was our own code wouldn't I be able to see it while debugging?
When I set the logging level for the agent to info I can see the following: https://pastebin.com/raw/kc6tVY4r On line 14 you can see 'com.day.cq.replication.impl.ReplicatorImpl' setting up a replication with no options for some reason:
*INFO* [Thread-1352] com.day.cq.replication.impl.ReplicatorImpl Setting up replication with options: ReplicationOptions{synchronous=false, revision='null', suppressStatusUpdate=false, suppressVersions=false, filter=null, aggregateHandler=null}
I see no custom code intervening whatsoever.
UPDATE: I tested something on 2 different livecopies. On LiveCopy A and B I had 3 rollout configurations set:
- /etc/msm/rolloutconfigs/pushonmodify
- /etc/msm/rolloutconfigs/activate
- /etc/msm/rolloutconfigs/deactivate
When I removed '/etc/msm/rolloutconfigs/activate' from LiveCopy A, only LiveCopy B got automatically published when modifying the page. Ofcourse we need the '/etc/msm/rolloutconfigs/activate' to be there for every LiveCopy because we don't want our authors to publish each LiveCopy page individually. This really seems to be an AEM related bug.
I'm able to reproduce the behavior you mentioned only if I turn on "On modification" checkbox of either publish agent or flush/invalidation agent and this makes sense because 'pushonmodify' would trigger 'onModify' event which these agents would capture and trigger the replication & flush accordingly.
If 'on modification' is left unchecked on both publish agent and invalidation agent, I do not see any publish activity with either of 'pushonmodify' or 'activate' rollouts unless the author publishes the content manually. Even after removing 'pushonmodify' and doing a manual rollout doesn't trigger publish activity.
couple of questions/tasks:
reference - MSM Best Practices,
When using the rollout trigger onModify you should consider that:
Therefore, it is recommended that you only use onModify triggers if the benefits of automatic rollout initiation outweigh any potential performance issues.
How have you configured the flush agent to be triggered automatically on modification as you have mentioned in the description? Is it via "On modification" checkbox under "Triggers" tab or something else?
The dispatcher flush agent is the only agent that has the trigger 'On Modification' checked on its triggers tab.
Do you have "On modification" checkbox turned on under "Triggers" tab for default publish agent?
The publish agent has no triggers checked on its triggers tab.
Could you disable both default publish agent and dispatcher flush agent and check the blocked queues for both agents http://localhost:4502/etc/replication/agents.author/flush.html and http://localhost:4502/etc/replication/agents.author/publish.html
Publish agent
Just to be clear - If you remove/disable the flush/invalidation agent then you do not observe this kind of auto-replication behavior? You could validate same with above mentioned step of checking publish agent queue/logs.
It seems that when the dispatcher flush agent is disabled I indeed don't see the auto-replication behaviour on modification only when I actually publish the page the corresponding item will be added to the publish agent queue.
Could you remove 'pushonmodify' from live copies and do a manual rollout from source page and check the behavior if automated publish still happens?
When I replace my LiveCopy rollout configuration:
/etc/msm/rolloutconfigs/activate
/etc/msm/rolloutconfigs/deactivate
/etc/msm/rolloutconfigs/pushonmodify
BY:
/etc/msm/rolloutconfigs/activate
/etc/msm/rolloutconfigs/deactivate
/etc/msm/rolloutconfigs/default
And trigger a manual rollout using http://localhost:4502/etc/blueprints.html , the auto-replication behaviour is NOT happening anymore. We benefit a lot from the '/etc/msm/rolloutconfigs/pushonmodify' config, we are aware of the potential negative performance impact.
Editing a component -
pushonmodify rollout pushed the content changes from source to live copies and then generated 'onmodify' event which was captured by flush agent and hence you see two invalidation requests with 'replication-service' triggered by flush agent.
I'm not sure why 'webservice-support-replication' got triggered for the child pages. Here's the explanation & fix for same [1]. Now 'msm-service' publish and invalidation request got triggered from the 'onmodify' trigger that was launched by 'pushonmodify' rollout. I believe this is what you mentioned as the problem statement. I'm not sure if this is a bug or expected functionality.
On publish-
Since admin user triggered the publish hence the first row is fine. In this case 'webservice-support-replication' [1] still got triggered. The rows with 'msm-service' are fine as it was a publish + invalidation request.
[1] - Flush replications are triggered by webservice-support-replication user | AEM 6.x
Per my knowledge, the common listener for publish agent and invalidation agent doesn't distinguish the action to be taken if the filter agent is null (which is the case here for onmodify event). Probably Jörg Hoh can help here..
The stack trace for 'Thread-1349' might be waiting on the agent itself because the queue was blocked.
"Now 'msm-service' publish and invalidation request got triggered from the 'onmodify' trigger that was launched by 'pushonmodify' rollout. I believe this is what you mentioned as the problem statement. I'm not sure if this is a bug or expected functionality." Indeed I am not sure as well :S
Strange. I see that the thread "ReplicateOnModification Processor" performs a replication with options and a filter set. That looks good. On the other hand, in the line 14 a replication is setup without any filter (as you mentioned), but this is done by "thread-1352". In AEM and Sling I would expect that the product uses threadpools to manage such threads, a unmanaged thread looks a bit suspicious to me. Can you try to repeat your scenario and do threaddumps to get a full stacktrace? That can gives us further indication about the root cause of it.
When I navigate to http://localhost:4502/system/console/status-jstack-threaddump and repeat the scenario and look for 'Thread-1349' I find the following:
"Thread-1349" #3948 daemon prio=1 os_prio=31 tid=0x00007fa7c8aeb000 nid=0x16107 waiting on condition [0x00007000112a1000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000074d6967e8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
I am not sure if this is of any use.
That's the stacktrace of a unused thread in a threadpool. For me it looks like that this threadpool is not created using sling mechanics, otherwise the threadname would be more meaningful (something like "pool-20-thread-7") or even speaking names ("oak-observation-1"). Are you sure that this thread is the one in question?
On the other hand side I see a "thread-25" here on my local AEM 6.4 instance which has the same stacktrace. That means, it can still be part of the product. Let me investigate on that.
I investigated the empty ReplicationOptions thread using VisualVM:
[Thread-3788] com.day.cq.replication.impl.ReplicatorImpl Setting up replication with options:
ReplicationOptions{synchronous=false, revision='null', suppressStatusUpdate=false, suppressVersions=false, filter=null, aggregateHandler=null}
I see no custom code interfering, it seems like it's all part of the product.
Do you have any idea Jörg?
Your dump indicates to me, that the MSM rollout invokes the TargetActivateActionFactory, which invokes replication. And this replication might trigger all agents. But that seems no match to your initial problem description, because there I read it in this way: The invalidation agent (with on-modification trigger set) causes the "standard" replication to get fired.
Yes, when I enable it, it does trigger the other default replication for some reason. So it still seems like a bug to me.
An old thread but this might help-
Replicating and controlling live copy replication
In this use case, the live copy had rollout config 'Activate on blueprint activation'..