Request for Feature Enhancement (RFE) Summary: |
When a server fails to successfully deploy the pipeline aborts leaving servers in various states of deployment. Instead when a publisher fails, the failed publisher/dispatcher pair should be left out of rotation and the deployment should continue to the next server. A notification/incident is sent/created to CSE and customer. In an environment where there are 10 publisher/dispatcher pairs one server or even multiple servers failing can be left out of rotation and still serve proper traffic. The aborted pipelines in this case cause 8-9 hours of delays, sometimes days to resolve. This costs all parties a significant monetary and time loses. Please create a JIRA ticket and request for this. Keep me posted on when that is created and share regular updates on the progress of the request |
Use-case: |
In large infrastructures deploys can take 9-10 hours to completed. One failure can cost hundreds of thousands of dollars in delays that can take anywhere from 12 to days of additional deployment hours, leaving part of the infrastructure in varying states of deployment. Rolling back or manually deploying to the remaining servers is not an option as content over 9-10 hours already has a delta, and manually deploying 30 packages across 21 servers takes even longer. This feature would reduce hours, and cost significantly as well as reduce the production server downtime caused by deployment failures due to infrastructure reasons. |
Current/Experienced Behavior: |
There are 10 publish/dispatcher pairs. The deployment fails on publisher 3 because publisher 3 is experiencing an issue or high load, when this happens the entire deployment is aborted. This leaves publisher/dispatcher pair 4 thru 10 un-deployed, so now we have author, pub1 and pub 2 serving traffic using new code and the rest of the stack serving traffic on old code. To redeploy the pipeline again using the re-execute feature can take another 10 hours. |
Improved/Expected Behavior: |
When a non author server fails to complete deployment, keep that publish/dispatcher pair out of rotation and continue the deployment to the rest of the infrastructure. Alert custom and CSE that a publisher/dispatcher is out of rotation due to a deploy failure. Add a feature into the CM pipeline that allows the ability to deploy to a single dispatcher publisher pair, or any server in your infrastructure. |
Environment Details (AEM version/service pack, any other specifics if applicable): |
AEM 6.5.21 : 21 server infrastructure, 1 author, 10 publishers, 10 dispatchers. |
Customer-name/Organization name: |
Undisclosed |
Screenshot (if applicable): |
N/A |
Code package (if applicable): |
N/A |