Hi
Recently i encountered a strange issue , I saw the publishers are responding very slow and when i took thread dumps to analyze this issue.
I saw that there are no blocked threads but many threads are in Waiting state for fetching an image. I raised a support ticket as well and they told that as image returns 404 all hits coming to publisher makes them responding slowly.
So here is the point that i m trying to understand, Why AEM is taking more time to return 404 for a resource like image if not available.
If 404 for an image get AEM publisher slow, Any small bot program which hit publisher with a malformed url of JPG as resource can bring down the application easily..
Am i missing something here?
Solved! Go to Solution.
Views
Replies
Total Likes
First of all, any item not in the dispatcher cache will result in a request to AEM publish. That means if you have a large amount of 404, all of them will reach the publish instance. That means, your first and primary mission should be to avoid that any request will cause a 404. That means no dangling references in your own site.
What you cannot avoid is someone malicious trying to flood your systems with requests which cannot be served from the dispatcher, or which are bypassing your dispatcher rules by intent. That is always possible and you can hardly mitigate that.
And if someone really wants to bring you down, they will always have a chance. If they cannot overload your publish instances, they will overload your uplink to the internet. In times when a single botnet can cause traffic surges to terabits per seconds, you cannot handle this anymore on your own. And definitely not with locking down your poor dispatchers and publish instances.
That means, that your prime concern should be self-DOSing. That means that regular enduser traffic backfires and kills you. Either by these 404s, by requesting to many non-cached files, or any other mean. Don't let that happen.
Views
Replies
Total Likes
Views
Replies
Total Likes
This doesn't seem to be an issue with 404 or an type/content of request but an issue with either memory allocation, CPU and I/O processing. There is a limited memory/heap/CPU allocated to each server for booting. If it falls short under heavy load, then it could respond slow but that should be momentarily unless some other process is continuously hogging on CPU/memory or the hardware sizing is not appropriate.
Were you able to find out approx. how requests server was processing that time? Did you get a chance to dig deeper and analyze why 404s were processing slow? Did you observe any other process/GC/compaction or something else running that time in logs/dumps?
Yes, a bot can potentially bring the servers down which is typically an example of DOS attack. There are several ways to mitigate it and implement preventive mechanisms.
From solution perspective, you should find out the real root cause and you may fix 404 issues or configure apache to handle 404s or even tweak dispatcher configurations to block/convert/redirect specific 404s, if that's applicable for your use case.
HTH
Views
Replies
Total Likes
I observed the CPU utilization of the publisher during that time and all are at 40-50%
Observed Mem usage and it hovering around 65-70 during that time.
We have mitigation for DDOS attack but those are on page view levels . Its hard to place some mechanism on fetching resource type of image.
The thread dumps shows me that these are waiting for fetching those image.Once i place a deny rule in dispatcher for those 404 images the system back to normal.
Views
Replies
Total Likes
Hi,
Did you try taking a look at the heap dump report.
Views
Replies
Total Likes
Jörg Hoh need your assistance on this.
Views
Replies
Total Likes
Haven't see any out of memory error or CPU high utilization. The only think i noticed that time is there are lot of request to that image to publisher( which supposed to serve from dispatcher but due to wrong url which lead to 404) . Thats why i took thread dumps rather than heapdump.
I was surprised to see that 404 hits to publisher can bring down the publisher
Views
Replies
Total Likes
Below is reply from Adobe day care
"
The Thread Dumps show approx. 195 GET requests running concurrently on the Publish instance.
A high number of these requests are getting a XX-barcelona.jpg file, but the requested paths are not starting with /content/dam as we could expect:
404 GET /content/XX/XX/content/dam/images/XX-barcelona.jpg
404 GET /content/XX/XX/content/dam/images/renditions/XX-barcelona.jpg/jcr:content/renditions/rendition.XX.XX.jpg
404 GET /content/XX/XX/images/chat-XX.jpg
404 GET /content/XX/XX/images/image-our-XX.jpg
404 GET /content/XX/XX/dam/images/renditions/XX-barcelona.jpg
404 GET /content/XX/XX/content/dam/images/renditions/XX-barcelona.jpg
404 GET /content/XX/XX/images/chat-XX.jpg
404 GET /content/XX/XX/content/dam/images/renditions/XX-barcelona.jpg
404 GET /content/XX/content/dam/images/renditions/XX-barcelona.jpg
404 GET /content/XX/images/chat-XX.jpg
404 GET /content/XX/content/dam/images/renditions/XX-barcelona.jpg
404 GET /content/XX/content/dam/images/renditions/XX-barcelona.jpg
404 GET /content/XX/content/dam/images/renditions/XX-barcelona.jpg
404 GET /content/XX/content/dam/images/renditions/XX-barcelona.jpg
404 GET /content/XX/content/dam/images/renditions/XX-barcelona.jpg
404 GET /content/XX/content/dam/images/renditions/XX-barcelona.jpg
404 GET /content/XX/content/dam/images/renditions/XX-barcelona.jpg
404 GET /content/XX/content/dam/images/renditions/XX-barcelona.jpg
404 GET /content/XX/content/dam/images/renditions/XX-barcelona.jpg
404 GET /content/XX/content/dam/images/renditions/XX-barcelona.jpg
404 GET /content/XX/images/chat-XX.jpg
404 GET /content/XX/images/helpfaq-contact-XX.jpg
404 GET /content/XX/content/dam/images/renditions/XX-barcelona.jpg
404 GET /content/XX/content/dam/images/renditions/XX-barcelona.jpg
404 GET /content/XX/content/dam/images/renditions/XX-barcelona.jpg
404 GET /content/XX/content/dam/images/renditions/XX-barcelona.jpg
404 GET /content/XX/content/dam/images/renditions/XX-barcelona.jpg
404 GET /content/XX/content/dam/images/renditions/XX-barcelona.jpg
404 GET /content/XX/images/image-our-XX.jpg
404 GET /content/XX/content/dam/images/renditions/XX-barcelona.jpg
404 GET /content/XX/content/dam/images/renditions/XX-barcelona.jpg/jcr:content/renditions/rendition.XX.XX.jpg
404 GET /content/XX/images/chat-placeholder.jpg
404 GET /content/XX/content/dam/images/renditions/XX-barcelona.jpg
These calls ended with a 404 HTTP error, and where therefore not cached and delivered by the dispatcher.
After adding a deny rule to the dispatcher configuration for the XX-barcelona.jpg, the performance of the publish instances slowly improved"
Views
Replies
Total Likes
Are you seeing less requests to PUB now as it should be served from Dispatcher?
Views
Replies
Total Likes
First of all, any item not in the dispatcher cache will result in a request to AEM publish. That means if you have a large amount of 404, all of them will reach the publish instance. That means, your first and primary mission should be to avoid that any request will cause a 404. That means no dangling references in your own site.
What you cannot avoid is someone malicious trying to flood your systems with requests which cannot be served from the dispatcher, or which are bypassing your dispatcher rules by intent. That is always possible and you can hardly mitigate that.
And if someone really wants to bring you down, they will always have a chance. If they cannot overload your publish instances, they will overload your uplink to the internet. In times when a single botnet can cause traffic surges to terabits per seconds, you cannot handle this anymore on your own. And definitely not with locking down your poor dispatchers and publish instances.
That means, that your prime concern should be self-DOSing. That means that regular enduser traffic backfires and kills you. Either by these 404s, by requesting to many non-cached files, or any other mean. Don't let that happen.
Views
Replies
Total Likes
Views
Likes
Replies
Views
Like
Replies