In my local env, I have 4 publisher. Sometime replication agent gets blocked and I will take time to find the cause, so is there any way to monitor the replication queue and get the alerts if its blocked.
Solved! Go to Solution.
Topics help categorize Community content and increase your ability to discover relevant content.
Views
Replies
Total Likes
Hi @Akash1247!
It is a good idea to monitor the status of your replication queues.
There are different approaches to this. One that I have seen with some of my customers is the following:
A simple BASH script could look like the following:
curl ${CURLPARAMETERS} -o tmp-replication-check.txt "http://${CQ_HOSTPORT}/etc/replication/agents.${CQ_TYPE}.html"
# get activated agents
REPL_AGENTS=`cat tmp-replication-check.txt | grep "cq-agent-header-on" | cut -d"\"" -f4`
REPL_AGENT_NUM=`cat tmp-replication-check.txt | grep "cq-agent-header-on" | cut -d"\"" -f4 | wc -l`
REPL_AGENT_PROBLEMS=0
AGENT_COUNT=0
echo "Found ${REPL_AGENT_NUM} activated replicatication agents, now checking each of them..."
# check all activated agents
for AGENT in ${REPL_AGENTS}
do
AGENT=${AGENT%.html}
SHORT_AGENT=`echo ${AGENT} | cut -d"/" -f5`
AGENT_COUNT=$((AGENT_COUNT + 1))
# check agent status
REPL_AGENT_STATUS=`cat tmp-replication-check.txt | grep -a2 "${AGENT}" | egrep "cq-agent-status-|cq-agent-queue-" | cut -d"\"" -f2 | cut -d" " -f2`
REPL_AGENT_UNEXP_STATUS=`echo "${REPL_AGENT_STATUS}" | grep -v "cq-agent-status-ok" | grep -v "cq-agent-queue-idle" | grep -v "cq-agent-queue-active" | wc -l`
if [ "${REPL_AGENT_UNEXP_STATUS}" -gt 0 ]; then
ERROR_OUTPUT=`echo "${REPL_AGENT_STATUS}" | tr "\\n" " "`
echo " Agent ${AGENT} has unexpected status: ${ERROR_OUTPUT} - please check!"
REPL_AGENT_PROBLEMS=$((REPL_AGENT_PROBLEMS + 1))
else
echo " Status of agent ${AGENT} is OK."
fi
# check logfile of agent
curl ${CURLPARAMETERS} -o tmp-replication-log-${AGENT_COUNT}.txt "http://${CQ_HOSTPORT}${AGENT}.log.html"
LOGS_NOTOK=false
# check logs for ERROR messages
NUM_LINES_TO_CHECK=25
NUM_ERRORS_IN_LOGS=`tail -${NUM_LINES_TO_CHECK} tmp-replication-log-${AGENT_COUNT}.txt | grep "ERROR" | wc -l`
if [ "${NUM_ERRORS_IN_LOGS}" -gt 0 ]; then
ERROR_OUTPUT_LOG=`tail -${NUM_LINES_TO_CHECK} ptmp-replication-log-${AGENT_COUNT}.txt | grep "ERROR"`
echo " Agent ${AGENT} shows ERRORs in logs - please check!"
echo ${ERROR_OUTPUT_LOG}
REPL_AGENT_PROBLEMS=$((REPL_AGENT_PROBLEMS + 1))
LOGS_NOTOK=true
else
LOGS_NOTOK=false
echo " LOGs of agent ${AGENT} are OK."
fi
# cleanup
rm -f tmp-replication-log-${AGENT_COUNT}.txt
done
if [ "${REPL_AGENT_PROBLEMS}" -gt 0 ]; then
echo "Found ${REPL_AGENT_PROBLEMS} problem(s) with agents! Please check!"
RETURN_CODE=2
else
echo "All activated agents are ok."
RETURN_CODE=0
fi
# cleanup
rm -f tmp-replication-check.txt
Disclaimer: This script is an extract from a pretty old scripting framework I wrote multiple years ago and has neither been individually tested nor is it ready-to-run. It is just meant for illustration purposes. There is lots of room for improvement, error checks as well as sanitizations are missing and there are probably other/better ways to achieve this. It's also very debatable if using bash is the right approach for this kind of task in the first place. It might be better to use an actual scripting language or develop a proper health check / endpoint in Java as part of your application for monitoring purposes that can directly be consumed by a monitoring system.
In addition to the monitoring part you should also investigate to find the root cause of frequently blocked replication queues as this is not a common/normal behavior.
Hope this helps!
Hi,
To monitor and get alerts for blocked replication queues in AEM:
Access the Adobe Granite Replication dashboard at http://localhost:4503/libs/granite/replication/content/status.html.
Monitor replication agents' status at http://localhost:4502/etc/replication/agents.author.html.
Set up alerts using AEM's "Notifications" feature for specific replication issues.
Analyze replication agent logs for debugging.
Consider using third-party monitoring tools for more extensive monitoring.
@ManviSharma can we write any script to get status of each publishers like if its idle state or blocked.
or may be using curl
I can get status of publishers using /libs/granite/operations/content/healthreports/healthreport.html/system/sling/monitoring/mbeans/org/apache/sling/healthcheck/HealthCheck/replicationQueue
but I again its manual thing.
Hi @Akash1247!
It is a good idea to monitor the status of your replication queues.
There are different approaches to this. One that I have seen with some of my customers is the following:
A simple BASH script could look like the following:
curl ${CURLPARAMETERS} -o tmp-replication-check.txt "http://${CQ_HOSTPORT}/etc/replication/agents.${CQ_TYPE}.html"
# get activated agents
REPL_AGENTS=`cat tmp-replication-check.txt | grep "cq-agent-header-on" | cut -d"\"" -f4`
REPL_AGENT_NUM=`cat tmp-replication-check.txt | grep "cq-agent-header-on" | cut -d"\"" -f4 | wc -l`
REPL_AGENT_PROBLEMS=0
AGENT_COUNT=0
echo "Found ${REPL_AGENT_NUM} activated replicatication agents, now checking each of them..."
# check all activated agents
for AGENT in ${REPL_AGENTS}
do
AGENT=${AGENT%.html}
SHORT_AGENT=`echo ${AGENT} | cut -d"/" -f5`
AGENT_COUNT=$((AGENT_COUNT + 1))
# check agent status
REPL_AGENT_STATUS=`cat tmp-replication-check.txt | grep -a2 "${AGENT}" | egrep "cq-agent-status-|cq-agent-queue-" | cut -d"\"" -f2 | cut -d" " -f2`
REPL_AGENT_UNEXP_STATUS=`echo "${REPL_AGENT_STATUS}" | grep -v "cq-agent-status-ok" | grep -v "cq-agent-queue-idle" | grep -v "cq-agent-queue-active" | wc -l`
if [ "${REPL_AGENT_UNEXP_STATUS}" -gt 0 ]; then
ERROR_OUTPUT=`echo "${REPL_AGENT_STATUS}" | tr "\\n" " "`
echo " Agent ${AGENT} has unexpected status: ${ERROR_OUTPUT} - please check!"
REPL_AGENT_PROBLEMS=$((REPL_AGENT_PROBLEMS + 1))
else
echo " Status of agent ${AGENT} is OK."
fi
# check logfile of agent
curl ${CURLPARAMETERS} -o tmp-replication-log-${AGENT_COUNT}.txt "http://${CQ_HOSTPORT}${AGENT}.log.html"
LOGS_NOTOK=false
# check logs for ERROR messages
NUM_LINES_TO_CHECK=25
NUM_ERRORS_IN_LOGS=`tail -${NUM_LINES_TO_CHECK} tmp-replication-log-${AGENT_COUNT}.txt | grep "ERROR" | wc -l`
if [ "${NUM_ERRORS_IN_LOGS}" -gt 0 ]; then
ERROR_OUTPUT_LOG=`tail -${NUM_LINES_TO_CHECK} ptmp-replication-log-${AGENT_COUNT}.txt | grep "ERROR"`
echo " Agent ${AGENT} shows ERRORs in logs - please check!"
echo ${ERROR_OUTPUT_LOG}
REPL_AGENT_PROBLEMS=$((REPL_AGENT_PROBLEMS + 1))
LOGS_NOTOK=true
else
LOGS_NOTOK=false
echo " LOGs of agent ${AGENT} are OK."
fi
# cleanup
rm -f tmp-replication-log-${AGENT_COUNT}.txt
done
if [ "${REPL_AGENT_PROBLEMS}" -gt 0 ]; then
echo "Found ${REPL_AGENT_PROBLEMS} problem(s) with agents! Please check!"
RETURN_CODE=2
else
echo "All activated agents are ok."
RETURN_CODE=0
fi
# cleanup
rm -f tmp-replication-check.txt
Disclaimer: This script is an extract from a pretty old scripting framework I wrote multiple years ago and has neither been individually tested nor is it ready-to-run. It is just meant for illustration purposes. There is lots of room for improvement, error checks as well as sanitizations are missing and there are probably other/better ways to achieve this. It's also very debatable if using bash is the right approach for this kind of task in the first place. It might be better to use an actual scripting language or develop a proper health check / endpoint in Java as part of your application for monitoring purposes that can directly be consumed by a monitoring system.
In addition to the monitoring part you should also investigate to find the root cause of frequently blocked replication queues as this is not a common/normal behavior.
Hope this helps!