We are seeing a large amount of 404 requests on our AEM application putting undue load on our publish server. Found requests get cached on dispatcher, but 404s fall through to publish even if the same location was requested a minute ago.
Is there a way to configure dispatcher to cache 404s? For example if user A requests http://domain.com/content/not-exist.html, dispatcher could cache the error page at the location /content/not-exist.html. Then if we do publish not-exist.html, it would get evicted at that time.
Thanks!
Solved! Go to Solution.
Views
Replies
Total Likes
No, I am not aware of a possibility to cache the 404 response on dispatcher. it would be a negative cache entry (that means an entry that means a non-existing file), and afaik it's not implemented. Having it would be a nice feature, though.
Can you raise a feature request on the dispatcher with Adobe support?
Jörg
Views
Replies
Total Likes
If you are sure the requests always result in 404, block the url pattern before coming to dispatcher.
Views
Replies
Total Likes
Please check Adobe CQ/Adobe AEM: How to cache Error page in CQ
Views
Replies
Total Likes
We are NOT sure it will be a 404. The request may or may not be valid. Depends on if authors have put content there or not.
I'm talking about a dynamic strategy whereby dispatcher will NOT hit publish with the same request twice in the case of a non-existent page.
Perhaps this could be part of some attack or perhaps it could be that our marketing people have sent a bad link in an email. Regardless, dispatcher caches valid responses but 404s go through to publish. In the case of lots of requests, this can put too much load on the publish server.
Like I said, we'd like dispatcher to cache 404 pages instead of hitting publisher again.
Views
Replies
Total Likes
Arun, we've already configured our site link so Adobe CQ/Adobe AEM: How to cache Error page in CQ but I don't believe that actually shields publish from requests to non-existent resources.
Rather it simply caches the 404 page in a fixed location on dispatcher so that publish only has to build the error page once. Useful, but not the problem we are trying to solve here.
Take a look at this log from error.log:
2019-05-13 21:36:10,605 *INFO* [192.168.56.1 [1557783370597] GET /content/domain_com/en-us/not-exist.html HTTP/1.1] org.apache.sling.engine.impl.SlingRequestProcessorImpl service: Resource /content/domain_com/en-us/not-exist.html not found
2019-05-13 21:36:17,817 *INFO* [192.168.56.1 [1557783377813] GET /content/panerabread_com/en-us/not-exist.html HTTP/1.1] org.apache.sling.engine.impl.SlingRequestProcessorImpl service: Resource /content/domain_com/en-us/not-exist.html not found
Those requests were generated by requesting a page from dispatcher that doesn't exist twice within one minute. If there was no resource there 7 seconds ago, there probably won't be one there now. I'd like dispatcher to cache that information until a resource is published at that location or until a specific timeout is reached. Is this possible?
Views
Replies
Total Likes
I am not sure if this is possible. But why are you getting too much 404
1. Are there broken links in your site?
2. Due to DoS or DDOS attack?
Views
Replies
Total Likes
Who is the referer for this traffic, are these just some automated scripts hammering you or are these your actual customers?
If these are your actual customers then look at the source of their journey and improve your site/partner site to avoid your customers hitting 404's.
if these are automated scripts then consider moving yourself under DDoS Protection | Akamai or similar tech who do a very good job of protecting you from unwanted traffic.
Additionally, you may generate a list of allowed url's via Publisher, upload it to Akamai and let Akamai route to you only allowed requests.
Once you fix/implement these the unwanted traffic you receive will reduce.
Regards,
Peter
Views
Replies
Total Likes
Hi PuzanovsP, it's both.
We are seeing some malicious traffic though it looks like hackers attempting to gather logins (wp-login.php) rather than actual DDoS.
We also have authors who distribute bad links. That's an organizational problem and not one that I can deal with (though we are working on it).
Are you saying that dispatcher does not have the functionality I am requesting?
Thanks!
John
Views
Replies
Total Likes
No, I am not aware of a possibility to cache the 404 response on dispatcher. it would be a negative cache entry (that means an entry that means a non-existing file), and afaik it's not implemented. Having it would be a nice feature, though.
Can you raise a feature request on the dispatcher with Adobe support?
Jörg
Views
Replies
Total Likes
Will do! Thanks Joerg!
Views
Replies
Total Likes
As a temporary solution till the time you get a patch from Support, you could set the response status 200 in 404.jsp and then the contents of 404.jsp would get served as 200 and hence get cached in dispatcher.
Just make sure to evaulate the pros/cons of this approach.
Views
Replies
Total Likes
Not a good idea to serve 404 over 200 response, it would impact SEO.
Views
Replies
Total Likes
Yeah, I appreciate the suggestion gauravb10066713 but we won't be doing that. SEO may be one problem, but we also don't want users' browsers caching those responses as 200s.
Views
Replies
Total Likes
@jkpanera did you find any solution for this
Views
Replies
Total Likes
Views
Like
Replies