When you are long-time user of AEM 6.x (and even CQ5), you are probably familiar with the Asset Update workflow. The primary task of it is the extraction of metadata from the binary asset and the creation of (smaller) renditions for it. This workflow is normally executed on the AEM authoring instance.
But since the begin of this approach it is plagued with problems:
The question of supported filetypes. Given the almost unlimited amount of file formats and their often proprietary implementation, it’s not always possible to perform these operations. In many cases, the support of these file types within Java is poor.
Additionally, depending on the size and the type of the asset and the quality of the library which provides support for this filetype, the processing can be very time consuming and also consume a lot of heap. Imagine that you can want to create renditions of a TIFF file which has dimensions of 10k * 10k pixels (assuming that you have a 24bit resolution) this requires 300 megabyte of contininous heap to store an uncompressed version of it. You have to size the heap size accordingly, otherwise you will run out of memory (OOM).
To avoid these issues, for many filetypes external tools like imagemagick were used, which both come with support of various image types (in many cases much better than the Java Image library), plus the ability not to blow the AEM process when the process fails (because imagemagick runs in a dedicated process). But also the capabilities of imagemagick are limited, and the support for more exotic (non-image) file types could be better.
In all cases you need to size your hardware for a worst case scenario. For example you need to provision a lot of heap, if your authors might start to ingest large images. And you need to provision enough CPU to mitigate negative impacts on all other operations.
Another big problem is the latency. Assuming that your asset is very large (it’s not uncommon to have assets larger than 1 Gigabyte), it takes time to copy the binary from the (remote) datastore to a location where the processing takes place. Even if you can transfer 100 MiB per second, it needs 10 seconds to have the file transferred to the local disk; normally this process runs through the AEM JVM, which is problematic in terms of heap usage, and also can cause performance problems. Not to mention code, which is not aware of the possible sizes and tries to load the complete stream into memory.