Expand my Community achievements bar.

Learn about Edge Delivery Services in upcoming GEM session
SOLVED

Custom Replication Event Handler not triggering intermittently on page activations. Please suggest.

Avatar

Level 4

Hello Team,

Custom Replication Handler is not triggering intermittently. It triggers for some time and does not trigger for a while and all those activation events are lost. To explain it better page is getting activated on to the publish instance, but activation event handler is not being kicked in.

We have a functionality of sending activated page information to third party system and there by third party system triggers an email to the subscribers. But due to this intermittent nature, replication event is not triggering at times and losing the activation of the page events, there by subscribers are not getting the email notifications.

Here is my replication listener code.

@Service(value = EventHandler.class)

@Component(immediate = true)

@Property(name = "event.topics", value = ReplicationAction.EVENT_TOPIC)

public class GlobalReplicationListener implements EventHandler{

private static final Logger LOGGER = LoggerFactory.getLogger(GlobalReplicationListener.class);

@Reference

ResourceResolverFactory resourceResolverFactory;

@Reference

private ThirdPartyIntegrationService thirdPartyIntegrationService;

private BundleContext bundleContext;

@Override

public void handleEvent(Event event) {

    ReplicationAction action = ReplicationAction.fromEvent(event);

    Map<String, Object> param = new HashMap<String, Object>();

    param.put(ResourceResolverFactory.SUBSERVICE, SUBSERVICE_USER_NAME_WRITE);

    if (action.getType() != null && action.getType().equals(ReplicationActionType.ACTIVATE)) {

        LOGGER.info("+++++++++++++ Replication Event Activation is invoked ++++++++++++++++++" +action.getPath());

        Resource resource = resolver.getResource(action.getPath());

       //get the jcr information from the resource value map and call the thirdparty service.

        thirdpartyService.sendPageInformation();

   }

}

protected void activate(ComponentContext ctx) {

    this.bundleContext = ctx.getBundleContext();

}

protected void deactivate(ComponentContext ctx) {

    this.bundleContext = null;

}

}

I have even tried the osgi configuration of felix event admin impl by configuring the time out for the package where my listener is.

/apps/mycompany/config.author/org.apache.felix.eventadmin.impl.EventAdmin.config.xml

<?xml version="1.0" encoding="UTF-8"?>

<jcr:root xmlns:sling="http://sling.apache.org/jcr/sling/1.0" xmlns:jcr="http://www.jcp.org/jcr/1.0"

jcr:primaryType="sling:OsgiConfig"

org.apache.felix.eventadmin.ThreadPoolSize="{Long}20"

org.apache.felix.eventadmin.AsyncToSyncThreadRatio="{Double}0.5"

org.apache.felix.eventadmin.Timeout="{Long}5000"

org.apache.felix.eventadmin.RequireTopic="{Boolean}true"

org.apache.felix.eventadmin.IgnoreTimeout="[org.apache.felix*,org.apache.sling*,com.day*,com.adobe*,com.mycompany.listeners.*]"

org.apache.felix.eventadmin.IgnoreTopic="[]"/>

I am not seeing any thing specific in the logs why my event handler is not firing intermittently. Could some one help me by providing some pointers/suggestions to solve this problem?

1 Accepted Solution

Avatar

Correct answer by
Employee Advisor

Hi,

Does the event handler recover from this situation or does it require a restart? In case of a restart, can you try to restart just the bundle containing the GlobalReplicationListener service?

You already ruled out my first assumption (blacklisting of the service because it took to long). I don't know any other way how an event listener all of a sudden is not triggered anymore.

Jörg

View solution in original post

6 Replies

Avatar

Community Advisor

Can you put a logger above the if condition and check till where the code is reaching . And additionally add one try catch and see if you are catching any error in the catch statement ? There should be something .. Please let me know if you could figure this out after adding try catch statements.

Avatar

Correct answer by
Employee Advisor

Hi,

Does the event handler recover from this situation or does it require a restart? In case of a restart, can you try to restart just the bundle containing the GlobalReplicationListener service?

You already ruled out my first assumption (blacklisting of the service because it took to long). I don't know any other way how an event listener all of a sudden is not triggered anymore.

Jörg

Avatar

Level 4

Thank you Jörg for your response.

When I observe the logs little closely, I see that due to my service calls with the third party is taking more time intermittently, event handler is getting blacklisted and there by it doesn't get any sent events.

"10.10.2017 04:36:36.596 *WARN* [Thread-12] org.apache.felix.eventadmin EventAdmin: Blacklisting ServiceReference [[org.osgi.service.event.EventHandler] | Bundle(com.mycompany.mybundle-core [492])] due to timeout!"

log messages in  between the multiple calls that I am making inside the event handler to the third party service. Its timing out before my third party calls are completed there by blacklisting the event handler.

I solved the issue by increasing the time out value from 5000(default value) to org.apache.felix.eventadmin.Timeout="{Long}10000".   Is there any performance concern/side effects of setting these value to higher time out value?

Not sure why the value of org.apache.felix.eventadmin.IgnoreTimeout="[org.apache.felix*,org.apache.sling*,com.day*,c om.adobe*,com.mycompany.listeners.*]" was not taking precedence as I have configured not to consider the timeout for my event handler package?

Avatar

Employee Advisor

The ignoreTimeout parameter should have taken effect. Increasing the timeout will not fix the issue, because as soon as one handler takes more than 10 seconds, your handler will be blacklisted again.

You should offload the actual handling of the listener into a dedicated threadpool, then this limit does not interfere any more.

Jörg

Avatar

Level 4

Thanks Jörg for your response.

Agree with you, again if any of third party service calls takes more than 10 seconds, it would fail again and get back to backlisted status. I thought of putting the value less than 100 ms, there by it will ignore the time out from the documentation.

I did not try offloading the handling of listener on to dedicated thread pool.

There was weird issue with ignoreTimeout. org.apache.felix.eventadmin.IgnoreTimeout="[org.apache.felix*,org.apache.sling*,com.day*,com.adobe*,com.mycompany.listeners.*]"

I had to change com.mycompany.listeners.* to com.mycompany.listeners* as with other adobe and sling packages above. Then ignoreTimeout was taking precedence over my timeout value.

Avatar

Level 1

Thanks @cqvoyager for sharing this.
star(*) without dot(.) helped me to resolve the issue.
Very helpful!