Expand my Community achievements bar.

Autoscaling AEM

Avatar

Level 5

I was wondering if anybody has experience out there with hosting AEM in Amazon Web Services and if they had autoscaling setup, eg as load increases spin up another dispatcher and/or publisher? If so can they share if this is a sensible approach to take? This has been marked as a significant requirement by some stakeholders and so I'm trying to establish if it really is something AEM is setup to allow, as all my past experience has been typically estimating load before hand and then as demand grows over time bringing new instances online manually. 

9 Replies

Avatar

Level 8

I've not seen this approach taken before, however it does sound interesting so i'm going to follow this thread in case of responses.

Out of curiosity, what's the monthly page views of the site in question?  Do they have spikes in traffic that would cause a situation where a new dispatcher would need to be start?

I don't think this would be achievable for publish instances because of 2 things:

1)  Licensing - you're licensed per publish instance so you'd basically have to have a number of spare licenses around just in case you needed to spin one up.
2)  Content - content is activated to the publish instances so without the instance being ran all of the time, none of the content would exist on the new publish instance and therefore be pointless.

Technically you could have some spare publish instances running in the background, but why?  Let the load balancer handle the load between everything.

Avatar

Level 5

The site has around 1M views per day and has 2 peaks per day dropping off to very low levels during the night. However they are not what I would describe as dramatic spikes. They currently run an older version of AEM and have had the same number of publishers/dispatchers for several years! However I think some senior people have been bitten by the cloud AWS bug!! 

I agree about the licenses: It seems like for the potential cost saving of automating spinning up a publisher and reduced server running costs because you can scale down the number of instances is offset by the fact you are wasting your licenses. Not sure if Adobe offers some sort of licensing model that can fit this usage model.

I think content you could potentially get around by letting AWS clone the Elastic Block Sore of a publisher and assign it to the new instance. Although really not sure how feasible that is.

I think spinning up dispatcher instances to meet increases in demand is slightly more sensible and achievable but again why would you bother...why not just provision them for max traffic and leave them running.

But yeah I don't think this is a sensible option but...its the cloud so it must be better wink 

Avatar

Level 1

Adobe recommends a shared-nothing architecture for your publishers - meaning each publisher holds its own Oak repository - usually on-disk.  The size of your Oak repository on your EBS volume will determine how long it takes to make a snapshot of that EBS volume.  As you're aware, you can't just hydrate a publisher repository on-demand - it can only be populated with either a clone of an already-populated repository or by re-activating all of your content.

AWS's base autoscaling event lifecycles do not include any cloning functionality.   You could attach your own lifecycle events (must be done via the AWS API and not available in the web console) and call out to some lambda functions to handle your scale-out tasks.  However, you will not - ever - autoscale on-demand to meet an immediate rise in traffic.  The shared-nothing, on-disk repository architecture makes AEM a cloud-unfriendly and operationally-heavy product.

I do recommend looking at the tooling from Shine Solutions:  https://github.com/shinesolutions/aem-orchestrator​ - while you're still limited by the oak repository limitations, their tooling does make your life significantly simpler.

Avatar

Level 1

AWS is currently in use by many firms (I can't name them - unofficially I know at-least 3 Multi Billion Dollar firms) for their AEM architecture.

AndrewC_EA - Auto-Scaling can be achieved when Architecture for AEM has the right instance type defined with the usage scenarios. Publish plays a vital role for Member outage but normally in a perfect scenario, if we have an Author with CRX and Datastore of 1TB, the publish would be ~500 GB. With configurations, we can separate CRX and Datastore. Datastore can be synced real time on serverless architecture real time and when scaling is required, we can do an offline copy of CRX and quickly bring the server up. This is not an easy task but we have firms that have achieved this. Also, there are several factors like - which OS the AEM is hosted on. If we have Windows - 2016, we can use DFSR. With the improvements in AWS, Autoscaling is possible.

There is a good article (published in 2016)on how this can be achieved. Again, proper analysis of Performance Stats with the proper arch sizing.......

                         https://d0.awsstatic.com/whitepapers/Adobe%20AEM%20on%20AWS.pdf

I will try to see if I can publish one with the new 2017 AWS and customizations.

Avatar

Level 2

Thanks, very interesting! Will spend some time going through that

Avatar

Level 2

Thanks Andrew for mentioning aem-orchestrator.

I'm the caretaker of AEM OpenCloud, which aem-orchestrator is part of.

AEM OpenCloud is an open source implementation of AEM infrastructure on AWS.

One of the architectures it provides is based on the whitepaper mentioned in previous comments.

AEM OpenCloud provides open source libraries which cover creating machine images, creating AEM environment with features such as blue-green deployment, auto recovery, auto scaling, package and snapshot backups, deployment descriptor, content health check, various metrics and alarms, and many others that would take a while to describe.

It also provides an alternative and simpler architecture with just a single EC2 instance running AEM author, publish, and dispatcher. This is used for automated code commit / pull request regression testing. i.e. when someone creates a pull request, it will automatically create an environment to run full regression test on, and then that instance will be terminated afterward.

If anyone is interested to learn more, feel free to ask more questions here or email me at cliff.subagio@shinesolutions.com .

There is a slide deck at AEM OpenCloud

All code is available on GitHub Search · topic:aem org:shinesolutions · GitHub

Not only everything is open source, AEM OpenCloud is already used in production by some of Australia's largest enterprises.

Avatar

Level 1

I've been wondering if it is possible to separate content from code in such a way that we can containerize the AEM instance with just code (OSGi bundles, templates, components, clientlibraries, etc.) that can be used to quickly spin-up a new (Publisher) instance - or to prepare the next code release - and where content (e.g. pages, assets, data sets) is maintained in a separate layer and available to the new instance.

In our experience, cloning an entire Publisher instance takes too much time and makes it impossible to benefit fully from AWS auto scaling.

Any views on this - or ideas on how to achieve this?

Avatar

Level 1

From what I understand in https://jackrabbit.apache.org/oak/docs/query/lucene.html#Persisting_indexes_to_FileSystem , Oak stores Lucene indexes by default within the `NodeStore`, which could be a remote MongoDB or RDBMS. In combination with clustering (https://jackrabbit.apache.org/oak/docs/clustering.html), to me this seems to mean that one could simply connect new instances to the same RDBMS or MongoDB, and then they should just work without copying any data. The RDBMS or MongoDB then would need to be clustered itself appropriately for availability and to be able to handle the load.


Some caveats seem to apply that may impact performance:

Lucene index must not be stored in file system (see Persisting_indexes link above)

probably no CopyOnWrite or CopyOnRead

no TarMK NodeStorage

It would be great if somebody in the know could shed some light on whether this does in fact work in practice?

Avatar

Employee Advisor

A shared S3 datastore can make all the binaries to sit in the S3 rather than the instances. By setting the S3 configurations to make the nodestore to keep minimal data, you can make almost all coming from S3.