Your achievements

Level 1

0% to

Level 2

Tip /
Sign in

Sign in to Community

to gain points, level up, and earn exciting badges like the new
BedrockMission!

Learn More

View all

Sign in to view all badges

Monitor/Alert when listener and scheduler stop running

Avatar

Avatar
Level 1
elizabethp28245
Level 1

Likes

0 likes

Total Posts

2 posts

Correct Reply

0 solutions
View profile

Avatar
Level 1
elizabethp28245
Level 1

Likes

0 likes

Total Posts

2 posts

Correct Reply

0 solutions
View profile
elizabethp28245
Level 1

09-09-2019

Hello. I have currently have a replication listener (that generate a pdf when a site is publish) and a scheduler, and they will randomly stop working until I reactive them. I want create a way to monitor and alert my team when they stop working because they are a critical part of our automated process. I read about Health Check which seem like a good solution for the scheduler, but I have not found something for the replication listener.

Replies

Avatar

Avatar
Give Back 50
Employee
berliant
Employee

Likes

207 likes

Total Posts

315 posts

Correct Reply

98 solutions
Top badges earned
Give Back 50
Give Back 5
Give Back 3
Give Back 25
Give Back 10
View profile

Avatar
Give Back 50
Employee
berliant
Employee

Likes

207 likes

Total Posts

315 posts

Correct Reply

98 solutions
Top badges earned
Give Back 50
Give Back 5
Give Back 3
Give Back 25
Give Back 10
View profile
berliant
Employee

10-09-2019

Avatar

Avatar
Give back 300
MVP
Gaurav-Behl
MVP

Likes

243 likes

Total Posts

1,145 posts

Correct Reply

281 solutions
Top badges earned
Give back 300
Give Back 50
Give Back 5
Give Back 3
Give Back 25
View profile

Avatar
Give back 300
MVP
Gaurav-Behl
MVP

Likes

243 likes

Total Posts

1,145 posts

Correct Reply

281 solutions
Top badges earned
Give back 300
Give Back 50
Give Back 5
Give Back 3
Give Back 25
View profile
Gaurav-Behl
MVP

10-09-2019

Did you/your team get a chance to debug its root cause? Does it throw any error in logs?

It seems that you plan to send an automated alert so that someone could manually restart the bundle as a workaround each time it happens.

Avatar

Avatar
Establish
Level 6
antoniom5495929
Level 6

Likes

90 likes

Total Posts

212 posts

Correct Reply

39 solutions
Top badges earned
Establish
Give Back 50
Give Back 5
Give Back 3
Give Back 25
View profile

Avatar
Establish
Level 6
antoniom5495929
Level 6

Likes

90 likes

Total Posts

212 posts

Correct Reply

39 solutions
Top badges earned
Establish
Give Back 50
Give Back 5
Give Back 3
Give Back 25
View profile
antoniom5495929
Level 6

10-09-2019

Hi,

what you can do is to use a jcr:property like configuration for activate or deactivate your replication listener.

In shortly you can use a page properties dialog with a checkbox which enable and disable the replication listener and after saving the property you can send a notification if the configuration is disabled. Into your replication listener you need to put an if which allow you to proceed with your code only if the property is enabled.

Let me know if could be a solution or need more info.

Thanks,
Antonio

Avatar

Avatar
Coach
Employee
Jörg_Hoh
Employee

Likes

1,087 likes

Total Posts

3,121 posts

Correct Reply

1,063 solutions
Top badges earned
Coach
Give back 600
Ignite 5
Ignite 3
Ignite 1
View profile

Avatar
Coach
Employee
Jörg_Hoh
Employee

Likes

1,087 likes

Total Posts

3,121 posts

Correct Reply

1,063 solutions
Top badges earned
Coach
Give back 600
Ignite 5
Ignite 3
Ignite 1
View profile
Jörg_Hoh
Employee

10-09-2019

Hi Antonio,

I don't think that this is a good solution. First of all, you just workaround the problem. And secondly, you allow somebody (an author) to restart services, which can have a lot of consequences (depending on the implementation, but it can lead to a system which is shortly unavailable when it's changed).

Avatar

Avatar
Coach
Employee
Jörg_Hoh
Employee

Likes

1,087 likes

Total Posts

3,121 posts

Correct Reply

1,063 solutions
Top badges earned
Coach
Give back 600
Ignite 5
Ignite 3
Ignite 1
View profile

Avatar
Coach
Employee
Jörg_Hoh
Employee

Likes

1,087 likes

Total Posts

3,121 posts

Correct Reply

1,063 solutions
Top badges earned
Coach
Give back 600
Ignite 5
Ignite 3
Ignite 1
View profile
Jörg_Hoh
Employee

10-09-2019

If you want to observe a scheduled job, a healthcheck can be helpful. You need to implement a healthcheck which checks that the last run of this service was within the last N minutes/hours. Your service has to record the current time before it finishes the processing, and the HC just compares this timestamp with the current time.

Regarding the root cause analysis: I have never had this problem, so it's definitly an issue somewhere. I would doubt that the scheduler broke down, so I rather assume that your service threw an exception and has an inconsistent in-memory state. Check that you handle all exceptions and log them at least.

Avatar

Avatar
Establish
Level 6
antoniom5495929
Level 6

Likes

90 likes

Total Posts

212 posts

Correct Reply

39 solutions
Top badges earned
Establish
Give Back 50
Give Back 5
Give Back 3
Give Back 25
View profile

Avatar
Establish
Level 6
antoniom5495929
Level 6

Likes

90 likes

Total Posts

212 posts

Correct Reply

39 solutions
Top badges earned
Establish
Give Back 50
Give Back 5
Give Back 3
Give Back 25
View profile
antoniom5495929
Level 6

11-09-2019

Hi,

I think that we can move this configuration in a system console configuration, in that way only an administrator could enable or disable this configuration. By the way it seems that anyone provide a solution for this point.

Seems that everyone suggest to use healthcheck for scheduler (which is really simple) but no solution provided for replication service.

Thanks,

Antonio

Avatar

Avatar
Validate 10
Level 3
Premkarthic-7WP
Level 3

Likes

8 likes

Total Posts

60 posts

Correct Reply

7 solutions
Top badges earned
Validate 10
Validate 1
Give Back 5
Give Back 3
Give Back
View profile

Avatar
Validate 10
Level 3
Premkarthic-7WP
Level 3

Likes

8 likes

Total Posts

60 posts

Correct Reply

7 solutions
Top badges earned
Validate 10
Validate 1
Give Back 5
Give Back 3
Give Back
View profile
Premkarthic-7WP
Level 3

11-09-2019

hi,

we had similar issue for replication listener that will make a call to third party api sometimes it will take more than 10 seconds to get response back and it get stopped randomly, while on debug we noticed the below error,

org.apache.felix.eventadmin EventAdmin: Blacklisting ServiceReference [[org.osgi.service.event.EventHandler]

we found that its causing as because,

"The Apache Felix Event Admin implementation is trying the deliver the events as fast as possible. Events sent from different threads are sent in parallel. Events from the same thread are sent in the order they are received (this is according to the spec). A timeout can be configured which is used for event handlers. If an event handler takes longer than the configured timeout to process an event, it is blacklisted. Once a handler is in a blacklist, it doesn't get sent any events anymore. The Felix Event Admin can be configured either through framework properties or through the configuration admin using PID org.apache.felix.eventadmin.impl.EventAdmin."

based on Apache Felix - Apache Felix Event Admin.

we tried increasing the time out value on  org.apache.felix.eventadmin.Timeout but that also didn't help.

So what we did, we have moved the part of event handling code to the dedicated thread pool and it never it stops again. Hope this helps.

Avatar

Avatar
Level 1
elizabethp28245
Level 1

Likes

0 likes

Total Posts

2 posts

Correct Reply

0 solutions
View profile

Avatar
Level 1
elizabethp28245
Level 1

Likes

0 likes

Total Posts

2 posts

Correct Reply

0 solutions
View profile
elizabethp28245
Level 1

11-09-2019

I checked the logs every time I notice that the job didn't ran in the past few days. The reason why I think the schedule got deactivated because there wasn't any output for the scheduler in the logs, and I was able to active it any when I open the configuration in the system configuration manager and click save regardless if I change anything or not. There a chance the scheduler was deactivate whenever the bundle restarted, I will build the health so I can monitor that is does not happen again

Avatar

Avatar
Coach
Employee
Jörg_Hoh
Employee

Likes

1,087 likes

Total Posts

3,121 posts

Correct Reply

1,063 solutions
Top badges earned
Coach
Give back 600
Ignite 5
Ignite 3
Ignite 1
View profile

Avatar
Coach
Employee
Jörg_Hoh
Employee

Likes

1,087 likes

Total Posts

3,121 posts

Correct Reply

1,063 solutions
Top badges earned
Coach
Give back 600
Ignite 5
Ignite 3
Ignite 1
View profile
Jörg_Hoh
Employee

12-09-2019

If you were able to crash the scheduler, it's definitely worth a support ticket. Another possible root cause could be that all threads of the scheduler were exhausted (maybe by endless loops or blocked threads), but that should be easy to spot in a threaddump.

Jörg