Replication agent queue is blocked, frequently

Question

In my local env, I have 4 publisher. Sometime replication agent gets blocked and I will take time to find the cause, so is there any way to monitor the replication queue and get the alerts if its blocked.

MarkusBullaAdobe · Accepted Answer

Hi @akash1247!It is a good idea to monitor the status of your replication queues.There are different approaches to this. One that I have seen with some of my customers is the following:Write a script to query the status of your replication agents. This could be e. g. a CLI script leveraging curl/wget to retrieve the replication agent status page (/etc/replication/agents.author.html) or the status of each replication agent individually. Parse the output and return the status information in a format of your preference.Integrate this script with your monitoring suite to retrieve alerts for blocked queues. A simple BASH script could look like the following: curl ${CURLPARAMETERS} -o tmp-replication-check.txt "http://${CQ_HOSTPORT}/etc/replication/agents.${CQ_TYPE}.html"# get activated agentsREPL_AGENTS=`cat tmp-replication-check.txt | grep "cq-agent-header-on" | cut -d""" -f4`REPL_AGENT_NUM=`cat tmp-replication-check.txt | grep "cq-agent-header-on" | cut -d""" -f4 | wc -l`REPL_AGENT_PROBLEMS=0AGENT_COUNT=0echo "Found ${REPL_AGENT_NUM} activated replicatication agents, now checking each of them..."# check all activated agentsfor AGENT in ${REPL_AGENTS}do        AGENT=${AGENT%.html}        SHORT_AGENT=`echo ${AGENT} | cut -d"/" -f5`        AGENT_COUNT=$((AGENT_COUNT + 1))        # check agent status        REPL_AGENT_STATUS=`cat tmp-replication-check.txt | grep -a2 "${AGENT}" | egrep "cq-agent-status-|cq-agent-queue-" | cut -d""" -f2 | cut -d" " -f2`        REPL_AGENT_UNEXP_STATUS=`echo "${REPL_AGENT_STATUS}" | grep -v "cq-agent-status-ok" | grep -v "cq-agent-queue-idle" | grep -v "cq-agent-queue-active" | wc -l`        if [ "${REPL_AGENT_UNEXP_STATUS}" -gt 0 ]; then                ERROR_OUTPUT=`echo "${REPL_AGENT_STATUS}" | tr "\n" " "`                echo "    Agent ${AGENT} has unexpected status: ${ERROR_OUTPUT} - please check!"                REPL_AGENT_PROBLEMS=$((REPL_AGENT_PROBLEMS + 1))        else                echo "    Status of agent ${AGENT} is  OK."        fi        # check logfile of agentcurl ${CURLPARAMETERS} -o tmp-replication-log-${AGENT_COUNT}.txt "http://${CQ_HOSTPORT}${AGENT}.log.html"         LOGS_NOTOK=false        # check logs for ERROR messages        NUM_LINES_TO_CHECK=25        NUM_ERRORS_IN_LOGS=`tail -${NUM_LINES_TO_CHECK} tmp-replication-log-${AGENT_COUNT}.txt | grep "ERROR" | wc -l`        if [ "${NUM_ERRORS_IN_LOGS}" -gt 0 ]; then                ERROR_OUTPUT_LOG=`tail -${NUM_LINES_TO_CHECK} ptmp-replication-log-${AGENT_COUNT}.txt | grep "ERROR"`                echo "    Agent ${AGENT} shows ERRORs in logs - please check!"                echo ${ERROR_OUTPUT_LOG}                REPL_AGENT_PROBLEMS=$((REPL_AGENT_PROBLEMS + 1))                LOGS_NOTOK=true        else                LOGS_NOTOK=false                echo "    LOGs of agent   ${AGENT} are OK."        fi        # cleanup        rm -f tmp-replication-log-${AGENT_COUNT}.txtdoneif [ "${REPL_AGENT_PROBLEMS}" -gt 0 ]; then        echo "Found ${REPL_AGENT_PROBLEMS} problem(s) with agents! Please check!"        RETURN_CODE=2else        echo "All activated agents are ok."        RETURN_CODE=0fi# cleanuprm -f tmp-replication-check.txt   Disclaimer: This script is an extract from a pretty old scripting framework I wrote multiple years ago and has neither been individually tested nor is it ready-to-run. It is just meant for illustration purposes. There is lots of room for improvement, error checks as well as sanitizations are missing and there are probably other/better ways to achieve this. It's also very debatable if using bash is the right approach for this kind of task in the first place. It might be better to use an actual scripting language or develop a proper health check / endpoint in Java as part of your application for monitoring purposes that can directly be consumed by a monitoring system. In addition to the monitoring part you should also investigate to find the root cause of frequently blocked replication queues as this is not a common/normal behavior. Hope this helps!

ManviSharma · Answer

Hi,

To monitor and get alerts for blocked replication queues in AEM:

Access the Adobe Granite Replication dashboard at http://localhost:4503/libs/granite/replication/content/status.html.
Monitor replication agents' status at http://localhost:4502/etc/replication/agents.author.html.
Set up alerts using AEM's "Notifications" feature for specific replication issues.
Analyze replication agent logs for debugging.
Consider using third-party monitoring tools for more extensive monitoring.

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded