Can I cache 404s on Dispatcher?

Avatar

Avatar

jkpanera

Avatar

jkpanera

jkpanera

13-05-2019

We are seeing a large amount of 404 requests on our AEM application putting undue load on our publish server. Found requests get cached on dispatcher, but 404s fall through to publish even if the same location was requested a minute ago.

Is there a way to configure dispatcher to cache 404s? For example if user A requests http://domain.com/content/not-exist.html, dispatcher could cache the error page at the location /content/not-exist.html. Then if we do publish not-exist.html, it would get evicted at that time.

Thanks!

Accepted Solutions (1)

Accepted Solutions (1)

Avatar

Avatar

Jörg_Hoh

Employee

Total Posts

3.0K

Likes

954

Correct Reply

1.0K

Avatar

Jörg_Hoh

Employee

Total Posts

3.0K

Likes

954

Correct Reply

1.0K
Jörg_Hoh
Employee

14-05-2019

No, I am not aware of a possibility to cache the 404 response on dispatcher. it would be a negative cache entry (that means an entry that means a non-existing file), and afaik it's not implemented. Having it would be a nice feature, though.

Can you raise a feature request on the dispatcher with Adobe support?

Jörg

Answers (11)

Answers (11)

Avatar

Avatar

jkpanera

Avatar

jkpanera

jkpanera

14-05-2019

Yeah, I appreciate the suggestion gauravb10066713​ but we won't be doing that. SEO may be one problem, but we also don't want users' browsers caching those responses as 200s.

Avatar

Avatar

Arun_Patidar

MVP

Total Posts

3.0K

Likes

1.1K

Correct Reply

840

Avatar

Arun_Patidar

MVP

Total Posts

3.0K

Likes

1.1K

Correct Reply

840
Arun_Patidar
MVP

14-05-2019

Not a good idea to serve 404 over 200 response, it would impact SEO.

Avatar

Avatar

Gaurav-Behl

MVP

Avatar

Gaurav-Behl

MVP

Gaurav-Behl
MVP

14-05-2019

As a temporary solution till the time you get a patch from Support, you could set the response status 200 in 404.jsp and then the contents of 404.jsp would get served as 200 and hence get cached in dispatcher.

Just make sure to evaulate the pros/cons of this approach.

Avatar

Avatar

jkpanera

Avatar

jkpanera

jkpanera

14-05-2019

Will do! Thanks Joerg!

Avatar

Avatar

jkpanera

Avatar

jkpanera

jkpanera

14-05-2019

Hi PuzanovsP​, it's both.

We are seeing some malicious traffic though it looks like hackers attempting to gather logins (wp-login.php) rather than actual DDoS.

We also have authors who distribute bad links. That's an organizational problem and not one that I can deal with (though we are working on it).

Are you saying that dispatcher does not have the functionality I am requesting?

Thanks!

John

Avatar

Avatar

PuzanovsP

MVP

Avatar

PuzanovsP

MVP

PuzanovsP
MVP

14-05-2019

Who is the referer for this traffic, are these just some automated scripts hammering you or are these your actual customers?

If these are your actual customers then look at the source of their journey and improve your site/partner site to avoid your customers hitting 404's.

if these are automated scripts then consider moving yourself under DDoS Protection | Akamai  or similar tech who do a very good job of protecting you from unwanted traffic.

Additionally, you may generate a list of allowed url's via Publisher, upload it to Akamai and let Akamai route to you only allowed requests.

Once you fix/implement these the unwanted traffic you receive will reduce.

Regards,

Peter

Avatar

Avatar

Arun_Patidar

MVP

Total Posts

3.0K

Likes

1.1K

Correct Reply

840

Avatar

Arun_Patidar

MVP

Total Posts

3.0K

Likes

1.1K

Correct Reply

840
Arun_Patidar
MVP

14-05-2019

I am not sure if this is possible. But why are you getting too much 404

1. Are there broken links in your site?

2. Due to DoS or DDOS attack?

Avatar

Avatar

jkpanera

Avatar

jkpanera

jkpanera

13-05-2019

Arun, we've already configured our site link so Adobe CQ/Adobe AEM: How to cache Error page in CQ  but I don't believe that actually shields publish from requests to non-existent resources.

Rather it simply caches the 404 page in a fixed location on dispatcher so that publish only has to build the error page once. Useful, but not the problem we are trying to solve here.

Take a look at this log from error.log:

2019-05-13 21:36:10,605 *INFO* [192.168.56.1 [1557783370597] GET /content/domain_com/en-us/not-exist.html HTTP/1.1] org.apache.sling.engine.impl.SlingRequestProcessorImpl service: Resource /content/domain_com/en-us/not-exist.html not found

2019-05-13 21:36:17,817 *INFO* [192.168.56.1 [1557783377813] GET /content/panerabread_com/en-us/not-exist.html HTTP/1.1] org.apache.sling.engine.impl.SlingRequestProcessorImpl service: Resource /content/domain_com/en-us/not-exist.html not found

Those requests were generated by requesting a page from dispatcher that doesn't exist twice within one minute. If there was no resource there 7 seconds ago, there probably won't be one there now. I'd like dispatcher to cache that information until a resource is published at that location or until a specific timeout is reached. Is this possible?

Avatar

Avatar

jkpanera

Avatar

jkpanera

jkpanera

13-05-2019

We are NOT sure it will be a 404. The request may or may not be valid. Depends on if authors have put content there or not.

I'm talking about a dynamic strategy whereby dispatcher will NOT hit publish with the same request twice in the case of a non-existent page.

Perhaps this could be part of some attack or perhaps it could be that our marketing people have sent a bad link in an email. Regardless, dispatcher caches valid responses but 404s go through to publish. In the case of lots of requests, this can put too much load on the publish server.

Like I said, we'd like dispatcher to cache 404 pages instead of hitting publisher again.

Avatar

Avatar

Arun_Patidar

MVP

Total Posts

3.0K

Likes

1.1K

Correct Reply

840

Avatar

Arun_Patidar

MVP

Total Posts

3.0K

Likes

1.1K

Correct Reply

840
Arun_Patidar
MVP

13-05-2019

Avatar

Avatar

Radha_Krishna_N

Avatar

Radha_Krishna_N

Radha_Krishna_N

13-05-2019

If you are sure the requests always result in 404, block the url pattern before coming to dispatcher.