Highlighted

aem fail to start under heavy load

liaquathk607427

06-12-2018

Hi Guys,

I have aem publish running on aws ec2 instance. There is a monit script to start and stop the aem service.

It was working all good. However once due to heavy load on the publish instance the aem service failed to start when new ec2 instance spun up as part of autoscaling.

Looking forward for you suggestions.

----------

check process aem with pidfile /srv/sfw/aem/crx-quickstart/conf/cq.pid

       start program = "/sbin/service aem start"

       stop program = "/sbin/service aem stop"

       if failed port 8080 then restart

-----------

The aem script

--------------------

CQ5_ROOT=/< >

CQ5_USER=< >

########

SERVER=${   }/crx-quickstart

START=${SERVER}/bin/start

STOP=${SERVER}/bin/stop

STATUS="${SERVER}/bin/status"

case "$1" in

start)

echo -n "Starting aem services: "

su - ${CQ5_USER} ${START}

touch /var/lock/subsys/aem

;;

stop)

echo -n "Shutting down aem services: "

su - ${CQ5_USER} ${STOP}

sleep 20

rm -f /var/lock/subsys/aem

;;

status)

su - ${CQ5_USER} ${STATUS}

;;

restart)

su - ${CQ5_USER} ${STOP}

su - ${CQ5_USER} ${START}

;;

reload)

;;

*)

echo "Usage: aem {start|stop|status|reload}"

exit 1

;;

esac

Replies

Highlighted

Jörg_Hoh

Employee

08-12-2018

How was it failing? Do you have log data? The data you provided unfortunately does not give any indication what went wrong.

Jörg

Highlighted

liaquathk607427

09-12-2018

Hi Jorg,

Thanks for your reply. The monit was unable to start the aem servcie.

This is the monit log which I have;

[AEST Sep  1 13:38:48] info     : monit: generated unique Monit id 815c50e52d068454185da5c072e65b81 and stored to '/root/.monit.id'

[AEST Sep  1 13:38:48] info     : Starting monit HTTP server at [localhost:2812]

[AEST Sep  1 13:38:48] info     : monit HTTP server started

[AEST Sep  1 13:38:48] info     : 'system_aem-publish-prd-blue-113' Monit started

[AEST Sep  1 13:38:48] error    : 'aem' failed, cannot open a connection to INET[localhost:8080] via TCP

[AEST Sep  1 13:38:48] info     : 'aem' trying to restart

[AEST Sep  1 13:38:48] info     : 'aem' stop: /sbin/service

[AEST Sep  1 13:38:52] info     : 'aem' start: /sbin/service

[AEST Sep  1 13:40:53] info     : 'aem' connection succeeded to INET[localhost:8080] via TCP

[AEST Sep  1 13:50:45] info     : stop service 'aem' on user request

[AEST Sep  1 13:50:45] info     : monit daemon at 3211 awakened

[AEST Sep  1 13:50:45] info     : Awakened by User defined signal 1

[AEST Sep  1 13:50:45] info     : 'aem' stop: /sbin/service

[AEST Sep  1 13:51:15] error    : 'aem' failed to stop

[AEST Sep  1 13:51:15] info     : 'aem' stop action done

[AEST Sep  1 13:54:24] info     : start service 'aem' on user request

[AEST Sep  1 13:54:24] info     : monit daemon at 3211 awakened

[AEST Sep  1 13:54:24] info     : Awakened by User defined signal 1

[AEST Sep  1 13:54:24] info     : 'aem' start: /sbin/service

[AEST Sep  1 13:54:25] info     : 'aem' started

[AEST Sep  1 13:54:25] info     : 'aem' start action done

[AEDT Oct 10 13:55:55] error    : 'aem' process is not running

[AEDT Oct 10 13:55:55] info     : 'aem' trying to restart

[AEDT Oct 10 13:55:55] info     : 'aem' start: /sbin/service

[AEDT Oct 10 13:56:25] error    : 'aem' failed to start

[AEDT Oct 10 13:58:25] info     : 'aem' process is running with pid 25063

Highlighted

liaquathk607427

09-12-2018

Also java was under lot of pump due to which the aem service stopped and monit script failed to restart the aem services. As a result a manual intervention was required to properly stop and start the aem service.

Highlighted

chandu_t

10-12-2018

My observation AEM start and stop will take long time, when under high load. But it should eventually come up. Do you have any AEM logs for that duration?

Highlighted

liaquathk607427

10-12-2018

Hi Chandu,

No unfortunately I do not have any aem logs. Moreover the monit was not starting the aem. Scripts for which I have pasted above in the thread.

Highlighted

Jörg_Hoh

Employee

11-12-2018

Just on the above logging statements it's impossible to guess what caused AEM not to start. Are there more logs of this process available (e.g outputs of the commands monit is using the background)?

Jörg

Highlighted
Highlighted

Jörg_Hoh

Employee

14-12-2018

Well, if there are no AEM logs available or the output of the start script itself (which might indicate JVM problems) I don't see a way thow I can help you resolving this issue 😕

Jörg

Highlighted

liaquathk607427

16-12-2018

Hi Jorg,

I understand with logs it hard to point out a solution. However any suggestions on modifying the startup script pasted above.