We have a problem in which in the exact moment we click to replicate our `apps` package, inside Package Manager, our instances all of them (author and publish) they break, they simply go down. They start returning http code 503 for everything, we can't even access CRXDE or Package Manager.
I don't know if the http code is real, because it is being returned by Amazon ELB, there is a probability ELB is changing the original http code (our instances are under ELB).
We have done this replication thing in local machines and other official Adobe envs, and only in an specific env we have this problem.
Hi guys, we have opened a ticket on day care, the problem is solved now.
Let me elaborate it here, so it can help others.
The real problem was caused by a feature of ACS Commons called Versioned Clientlibs, that we have started using and is ready to be deployed.
ACS Commons was installed on our `author` instance correctly, so we didn't have problems there. When whe replicated our apps package to the `publish` instance, the instance went down because ACS Commons was missing.
What we couldn't understand yet is why `author` was gone if the problem was on `publish`. It may be a problem related to Amazon ELB.
We have replicated ACS Commons accessing author instance directly using IP address.
Well, I would say that this is kind of behaviour I would expect.
Let me elaborate. In your apps package you typically have a lot of stuff, templates, components, bundles. When you install them on a system, you can see in the logs that a lot of activity is going to start. Services are restarting, dependencies are re-wired, caches rebuilt etc; you might have moments where the rendering is simply not working. You are deploying your application so it's kind of expected behavior. But this is not the core of your problem.
The problem is that you do that on all publishs at the same time. When you replicate that package to all publish instances, the package arrives there at nearly the same time. So you bring down all your publish instances at the same time. And during that time the ELB cannot reach a working publish instance and returns a 503.
You need to establish a blue-green style of deployment for your publish instances. Split your publish instances in 2 distinct sets and then deploy each set at once (and leave the other one unaffected by this). That isn't that easy as hitting "replicate" but requires more work or automation. But it's definitely worth to invest time in there.