Can I cache 404s on Dispatcher?

jkpanera

13-05-2019

We are seeing a large amount of 404 requests on our AEM application putting undue load on our publish server. Found requests get cached on dispatcher, but 404s fall through to publish even if the same location was requested a minute ago.

Is there a way to configure dispatcher to cache 404s? For example if user A requests http://domain.com/content/not-exist.html, dispatcher could cache the error page at the location /content/not-exist.html. Then if we do publish not-exist.html, it would get evicted at that time.

Thanks!

Accepted Solutions (1)

Accepted Solutions (1)

Jörg_Hoh

Employee

14-05-2019

No, I am not aware of a possibility to cache the 404 response on dispatcher. it would be a negative cache entry (that means an entry that means a non-existing file), and afaik it's not implemented. Having it would be a nice feature, though.

Can you raise a feature request on the dispatcher with Adobe support?

Jörg

Answers (11)

Answers (11)

jkpanera

14-05-2019

Yeah, I appreciate the suggestion gauravb10066713​ but we won't be doing that. SEO may be one problem, but we also don't want users' browsers caching those responses as 200s.

Gaurav-Behl

MVP

14-05-2019

As a temporary solution till the time you get a patch from Support, you could set the response status 200 in 404.jsp and then the contents of 404.jsp would get served as 200 and hence get cached in dispatcher.

Just make sure to evaulate the pros/cons of this approach.

jkpanera

14-05-2019

Hi PuzanovsP​, it's both.

We are seeing some malicious traffic though it looks like hackers attempting to gather logins (wp-login.php) rather than actual DDoS.

We also have authors who distribute bad links. That's an organizational problem and not one that I can deal with (though we are working on it).

Are you saying that dispatcher does not have the functionality I am requesting?

Thanks!

John

PuzanovsP

MVP

14-05-2019

Who is the referer for this traffic, are these just some automated scripts hammering you or are these your actual customers?

If these are your actual customers then look at the source of their journey and improve your site/partner site to avoid your customers hitting 404's.

if these are automated scripts then consider moving yourself under DDoS Protection | Akamai  or similar tech who do a very good job of protecting you from unwanted traffic.

Additionally, you may generate a list of allowed url's via Publisher, upload it to Akamai and let Akamai route to you only allowed requests.

Once you fix/implement these the unwanted traffic you receive will reduce.

Regards,

Peter

Arun_Patidar

MVP

14-05-2019

I am not sure if this is possible. But why are you getting too much 404

1. Are there broken links in your site?

2. Due to DoS or DDOS attack?

jkpanera

13-05-2019

Arun, we've already configured our site link so Adobe CQ/Adobe AEM: How to cache Error page in CQ  but I don't believe that actually shields publish from requests to non-existent resources.

Rather it simply caches the 404 page in a fixed location on dispatcher so that publish only has to build the error page once. Useful, but not the problem we are trying to solve here.

Take a look at this log from error.log:

2019-05-13 21:36:10,605 *INFO* [192.168.56.1 [1557783370597] GET /content/domain_com/en-us/not-exist.html HTTP/1.1] org.apache.sling.engine.impl.SlingRequestProcessorImpl service: Resource /content/domain_com/en-us/not-exist.html not found

2019-05-13 21:36:17,817 *INFO* [192.168.56.1 [1557783377813] GET /content/panerabread_com/en-us/not-exist.html HTTP/1.1] org.apache.sling.engine.impl.SlingRequestProcessorImpl service: Resource /content/domain_com/en-us/not-exist.html not found

Those requests were generated by requesting a page from dispatcher that doesn't exist twice within one minute. If there was no resource there 7 seconds ago, there probably won't be one there now. I'd like dispatcher to cache that information until a resource is published at that location or until a specific timeout is reached. Is this possible?

jkpanera

13-05-2019

We are NOT sure it will be a 404. The request may or may not be valid. Depends on if authors have put content there or not.

I'm talking about a dynamic strategy whereby dispatcher will NOT hit publish with the same request twice in the case of a non-existent page.

Perhaps this could be part of some attack or perhaps it could be that our marketing people have sent a bad link in an email. Regardless, dispatcher caches valid responses but 404s go through to publish. In the case of lots of requests, this can put too much load on the publish server.

Like I said, we'd like dispatcher to cache 404 pages instead of hitting publisher again.