Expand my Community achievements bar.

SOLVED

Can I cache 404s on Dispatcher?

Avatar

Level 5

We are seeing a large amount of 404 requests on our AEM application putting undue load on our publish server. Found requests get cached on dispatcher, but 404s fall through to publish even if the same location was requested a minute ago.

Is there a way to configure dispatcher to cache 404s? For example if user A requests http://domain.com/content/not-exist.html, dispatcher could cache the error page at the location /content/not-exist.html. Then if we do publish not-exist.html, it would get evicted at that time.

Thanks!

1 Accepted Solution

Avatar

Correct answer by
Employee Advisor

No, I am not aware of a possibility to cache the 404 response on dispatcher. it would be a negative cache entry (that means an entry that means a non-existing file), and afaik it's not implemented. Having it would be a nice feature, though.

Can you raise a feature request on the dispatcher with Adobe support?

Jörg

View solution in original post

13 Replies

Avatar

Level 4

If you are sure the requests always result in 404, block the url pattern before coming to dispatcher.

Avatar

Level 5

We are NOT sure it will be a 404. The request may or may not be valid. Depends on if authors have put content there or not.

I'm talking about a dynamic strategy whereby dispatcher will NOT hit publish with the same request twice in the case of a non-existent page.

Perhaps this could be part of some attack or perhaps it could be that our marketing people have sent a bad link in an email. Regardless, dispatcher caches valid responses but 404s go through to publish. In the case of lots of requests, this can put too much load on the publish server.

Like I said, we'd like dispatcher to cache 404 pages instead of hitting publisher again.

Avatar

Level 5

Arun, we've already configured our site link so Adobe CQ/Adobe AEM: How to cache Error page in CQ  but I don't believe that actually shields publish from requests to non-existent resources.

Rather it simply caches the 404 page in a fixed location on dispatcher so that publish only has to build the error page once. Useful, but not the problem we are trying to solve here.

Take a look at this log from error.log:

2019-05-13 21:36:10,605 *INFO* [192.168.56.1 [1557783370597] GET /content/domain_com/en-us/not-exist.html HTTP/1.1] org.apache.sling.engine.impl.SlingRequestProcessorImpl service: Resource /content/domain_com/en-us/not-exist.html not found

2019-05-13 21:36:17,817 *INFO* [192.168.56.1 [1557783377813] GET /content/panerabread_com/en-us/not-exist.html HTTP/1.1] org.apache.sling.engine.impl.SlingRequestProcessorImpl service: Resource /content/domain_com/en-us/not-exist.html not found

Those requests were generated by requesting a page from dispatcher that doesn't exist twice within one minute. If there was no resource there 7 seconds ago, there probably won't be one there now. I'd like dispatcher to cache that information until a resource is published at that location or until a specific timeout is reached. Is this possible?

Avatar

Community Advisor

I am not sure if this is possible. But why are you getting too much 404

1. Are there broken links in your site?

2. Due to DoS or DDOS attack?



Arun Patidar

Avatar

Community Advisor

Who is the referer for this traffic, are these just some automated scripts hammering you or are these your actual customers?

If these are your actual customers then look at the source of their journey and improve your site/partner site to avoid your customers hitting 404's.

if these are automated scripts then consider moving yourself under DDoS Protection | Akamai  or similar tech who do a very good job of protecting you from unwanted traffic.

Additionally, you may generate a list of allowed url's via Publisher, upload it to Akamai and let Akamai route to you only allowed requests.

Once you fix/implement these the unwanted traffic you receive will reduce.

Regards,

Peter

Avatar

Level 5

Hi PuzanovsP​, it's both.

We are seeing some malicious traffic though it looks like hackers attempting to gather logins (wp-login.php) rather than actual DDoS.

We also have authors who distribute bad links. That's an organizational problem and not one that I can deal with (though we are working on it).

Are you saying that dispatcher does not have the functionality I am requesting?

Thanks!

John

Avatar

Correct answer by
Employee Advisor

No, I am not aware of a possibility to cache the 404 response on dispatcher. it would be a negative cache entry (that means an entry that means a non-existing file), and afaik it's not implemented. Having it would be a nice feature, though.

Can you raise a feature request on the dispatcher with Adobe support?

Jörg

Avatar

Level 10

As a temporary solution till the time you get a patch from Support, you could set the response status 200 in 404.jsp and then the contents of 404.jsp would get served as 200 and hence get cached in dispatcher.

Just make sure to evaulate the pros/cons of this approach.

Avatar

Community Advisor

Not a good idea to serve 404 over 200 response, it would impact SEO.



Arun Patidar

Avatar

Level 5

Yeah, I appreciate the suggestion gauravb10066713​ but we won't be doing that. SEO may be one problem, but we also don't want users' browsers caching those responses as 200s.

Avatar

Level 2

@jkpanera did you find any solution for this