Replies

Avatar

Avatar

Jdruwe

Avatar

Jdruwe

Jdruwe

21-01-2019

I've updated the trigger for the invalidation agent but it still seems to trigger a replication for some reason. If it was our own code wouldn't I be able to see it while debugging?

When I set the logging level for the agent to info I can see the following: https://pastebin.com/raw/kc6tVY4r On line 14 you can see 'com.day.cq.replication.impl.ReplicatorImpl' setting up a replication with no options for some reason:

*INFO* [Thread-1352] com.day.cq.replication.impl.ReplicatorImpl Setting up replication with options: ReplicationOptions{synchronous=false, revision='null', suppressStatusUpdate=false, suppressVersions=false, filter=null, aggregateHandler=null} 

I see no custom code intervening whatsoever.

UPDATE: I tested something on 2 different livecopies. On LiveCopy A and B I had 3 rollout configurations set:

- /etc/msm/rolloutconfigs/pushonmodify

- /etc/msm/rolloutconfigs/activate

- /etc/msm/rolloutconfigs/deactivate

When I removed '/etc/msm/rolloutconfigs/activate' from LiveCopy A, only LiveCopy B got automatically published when modifying the page. Ofcourse we need the '/etc/msm/rolloutconfigs/activate' to be there for every LiveCopy because we don't want our authors to publish each LiveCopy page individually. This really seems to be an AEM related bug.

Avatar

Avatar

Gaurav-Behl

MVP

Avatar

Gaurav-Behl

MVP

Gaurav-Behl
MVP

21-01-2019

I'm able to reproduce the behavior you mentioned only if I turn on "On modification" checkbox of either publish agent or flush/invalidation agent and this makes sense because 'pushonmodify' would trigger 'onModify' event which these agents would capture and trigger the replication & flush accordingly.

If 'on modification' is left unchecked on both publish agent and invalidation agent, I do not see any publish activity with either of 'pushonmodify' or 'activate' rollouts unless the author publishes the content manually. Even after removing 'pushonmodify' and doing a manual rollout doesn't trigger publish activity.

couple of questions/tasks:

  • How have you configured the flush agent to be triggered automatically on modification as you have mentioned in the description? Is it via "On modification" checkbox under "Triggers" tab or something else?
  • Do you have "On modification" checkbox turned on under "Triggers" tab for default publish agent?
  • Could you disable both default publish agent and dispatcher flush agent and check the blocked queues for both agents - http://localhost:4502/etc/replication/agents.author/flush.html  and http://localhost:4502/etc/replication/agents.author/publish.html
    • Validate the user name that triggers the replication and flush? Is it 'msm-service' or 'replication-service' or some other?
    • Validate the sequence of tasks for flush and replication. If pushonmodify rollout/msm triggers that replication, then you should see the user name as 'msm-service'. If its triggered by flush agent, then it would be with user 'replication-service'
  • Just to be clear - If you remove/disable the flush/invalidation agent then you do not observe this kind of auto-replication behavior? You could validate same with above mentioned step of checking publish agent queue/logs.
  • Could you remove 'pushonmodify' from live copies and do a manual rollout from source page and check the behavior if automated publish still happens?

reference - MSM Best Practices,

onModify

When using the rollout trigger onModify you should consider that:

  • Automating rollouts with onModify triggers may have a negative impact on authoring performance as they trigger rollouts after every page modification.
  • The rollout result may differ from the one expected as:
    • You cannot specify the order of the resulting modify events.
    • The event-based architecture cannot guarantee the sequence of the events passed to the Rollout Manager.
  • Using such a rollout configuration could lead to commit conflicts if concurrent updates of the same resource occur.

Therefore, it is recommended that you only use onModify triggers if the benefits of automatic rollout initiation outweigh any potential performance issues.

Avatar

Avatar

Jörg_Hoh

Employee

Total Posts

3.0K

Likes

942

Correct Reply

1.0K

Avatar

Jörg_Hoh

Employee

Total Posts

3.0K

Likes

942

Correct Reply

1.0K
Jörg_Hoh
Employee

21-01-2019

Strange. I see that the thread "ReplicateOnModification Processor" performs a replication with options and a filter set. That looks good. On the other hand, in the line 14 a replication is setup without any filter (as you mentioned), but this is done by "thread-1352". In AEM and Sling I would expect that the product uses threadpools to manage such threads, a unmanaged thread looks a bit suspicious to me. Can you try to repeat your scenario and do threaddumps to get a full stacktrace? That can gives us further indication about the root cause of it.

Avatar

Avatar

Jdruwe

Avatar

Jdruwe

Jdruwe

22-01-2019

How have you configured the flush agent to be triggered automatically on modification as you have mentioned in the description? Is it via "On modification" checkbox under "Triggers" tab or something else?

The dispatcher flush agent is the only agent that has the trigger 'On Modification' checked on its triggers tab.

Do you have "On modification" checkbox turned on under "Triggers" tab for default publish agent?

The publish agent has no triggers checked on its triggers tab.

Could you disable both default publish agent and dispatcher flush agent and check the blocked queues for both agents http://localhost:4502/etc/replication/agents.author/flush.html  and http://localhost:4502/etc/replication/agents.author/publish.html

  • Validate the user name that triggers the replication and flush? Is it 'msm-service' or 'replication-service' or some other?
    • NOTE: I have not disabled the agents but just changed their target ip's because disabling resulted in seeing no queued items at all for both of them.
    • Editing a component
      • Dispatcher flush agent
        • component-edit-dispatcher-flush-agent.png
      • Publish agent

        • component-edit-publish-agent.png
    • Publishing the page
      • Dispatcher flush agent
        • publising-dispatcher-flush-agent.png
      • Publish agent
        • publising-publish-agent.png
  • Validate the sequence of tasks for flush and replication. If pushonmodify rollout/msm triggers that replication, then you should see the user name as 'msm-service'. If its triggered by flush agent, then it would be with user 'replication-service'
    • See answer above, both 'msm-service' and 'replication-service' are listed.

Just to be clear - If you remove/disable the flush/invalidation agent then you do not observe this kind of auto-replication behavior? You could validate same with above mentioned step of checking publish agent queue/logs.

    • Editing a component
      • Dispatcher flush agent
        • No items on queue due to the agent being disabled
      • Publish agent
        • NO item is being added to the queue
    • Publishing a page
      • Dispatcher flush agent
        • No items on queue due to the agent being disabled
      • Publish agent
        • publishing2-publish-agent.png

It seems that when the dispatcher flush agent is disabled I indeed don't see the auto-replication behaviour on modification only when I actually publish the page the corresponding item will be added to the publish agent queue.

Could you remove 'pushonmodify' from live copies and do a manual rollout from source page and check the behavior if automated publish still happens?

When I replace my LiveCopy rollout configuration:

/etc/msm/rolloutconfigs/activate

/etc/msm/rolloutconfigs/deactivate

/etc/msm/rolloutconfigs/pushonmodify

BY:

/etc/msm/rolloutconfigs/activate

/etc/msm/rolloutconfigs/deactivate

/etc/msm/rolloutconfigs/default

And trigger a manual rollout using http://localhost:4502/etc/blueprints.html , the auto-replication behaviour is NOT happening anymore. We benefit a lot from the '/etc/msm/rolloutconfigs/pushonmodify' config, we are aware of the potential negative performance impact.

Avatar

Avatar

Jdruwe

Avatar

Jdruwe

Jdruwe

22-01-2019

When I navigate to http://localhost:4502/system/console/status-jstack-threaddump and repeat the scenario and look for 'Thread-1349' I find the following:

"Thread-1349" #3948 daemon prio=1 os_prio=31 tid=0x00007fa7c8aeb000 nid=0x16107 waiting on condition [0x00007000112a1000]

   java.lang.Thread.State: WAITING (parking)

at sun.misc.Unsafe.park(Native Method)

- parking to wait for  <0x000000074d6967e8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

I am not sure if this is of any use.

Avatar

Avatar

Gaurav-Behl

MVP

Avatar

Gaurav-Behl

MVP

Gaurav-Behl
MVP

22-01-2019

Editing a component -

pushonmodify rollout pushed the content changes from source to live copies and then generated 'onmodify' event which was captured by flush agent and hence you see two invalidation requests with 'replication-service' triggered by flush agent.

I'm not sure why 'webservice-support-replication' got triggered for the child pages. Here's the explanation & fix for same [1]. Now 'msm-service' publish and invalidation request got triggered from the 'onmodify' trigger that was launched by 'pushonmodify' rollout. I believe this is what you mentioned as the problem statement. I'm not sure if this is a bug or expected functionality.

On publish-

Since admin user triggered the publish hence the first row is fine. In this case 'webservice-support-replication' [1] still got triggered. The rows with 'msm-service' are fine as it was a publish + invalidation request.

[1] - Flush replications are triggered by webservice-support-replication user | AEM 6.x

Per my knowledge, the common listener for publish agent and invalidation agent doesn't distinguish the action to be taken if the filter agent is null (which is the case here for onmodify event). Probably Jörg Hoh can help here..

The stack trace for 'Thread-1349' might be waiting on the agent itself because the queue was blocked.

Avatar

Avatar

Jörg_Hoh

Employee

Total Posts

3.0K

Likes

942

Correct Reply

1.0K

Avatar

Jörg_Hoh

Employee

Total Posts

3.0K

Likes

942

Correct Reply

1.0K
Jörg_Hoh
Employee

22-01-2019

That's the stacktrace of a unused thread in a threadpool. For me it looks like that this threadpool is not created using sling mechanics, otherwise the threadname would be more meaningful (something like "pool-20-thread-7") or even speaking names ("oak-observation-1"). Are you sure that this thread is the one in question?

On the other hand side I see a "thread-25" here on my local AEM 6.4 instance which has the same stacktrace. That means, it can still be part of the product. Let me investigate on that.

Avatar

Avatar

Jdruwe

Avatar

Jdruwe

Jdruwe

23-01-2019

"Now 'msm-service' publish and invalidation request got triggered from the 'onmodify' trigger that was launched by 'pushonmodify' rollout. I believe this is what you mentioned as the problem statement. I'm not sure if this is a bug or expected functionality." Indeed I am not sure as well :S

Avatar

Avatar

Jdruwe

Avatar

Jdruwe

Jdruwe

23-01-2019

I investigated the empty ReplicationOptions thread using VisualVM:

[Thread-3788] com.day.cq.replication.impl.ReplicatorImpl Setting up replication with options:

ReplicationOptions{synchronous=false, revision='null', suppressStatusUpdate=false, suppressVersions=false, filter=null, aggregateHandler=null}

1674304_pastedImage_2.png

I see no custom code interfering, it seems like it's all part of the product.

Avatar

Avatar

Jdruwe

Avatar

Jdruwe

Jdruwe

28-01-2019

Do you have any idea Jörg?