Hi Experts
Problem Statement: I am reaching out here to help address a baffling problem that happened to 2 publishers in our production ecosystem:
Error logs show below ScriptEvaluationException:
Caused by: org.apache.sling.api.scripting.ScriptEvaluationException: An exception occurred processing JSP page /apps/<clientname>/components/page/business/page-business-basepage/redirect.jsp at line 47
Impact: The impact was that the content page belonging to these templates which had redirect.jsp inclusions started to still serve content but with a response code of 404 causing downstream systems to malfunction for over 10+ hours causing a huge surge in 404s as well.
Short-Term Resolution: Though the issue got resolved via a fsclassloader reset followed by instance reboot, but we are trying to understand how this issue could have happened in the first place.
More points from our diagnosis
- This has never happened before and hasn't been repeated after that.
- We have looked at this file's change history and its over 1+ year old.
- Ran queries in CRXDE to see if anything was modified on the runtime and it’s all clean.
- Logs show script evaluation errors pointing to a line number 47 for redirect.jsp. But as soon as the cache was cleaned, and the instance was restarted the error went away.
- A few admin sessions were found in request.log with POST /crx/de requests leading to a dead end.
- Audit logs were empty. Maybe they are setup incorrectly?
- There were no deployments during/before this period.
- It happened to both of our business servers around the same time.
- Tried reproducing this locally with the existing compiled server redirect.jsp file but it wasn’t reproduced.
Questions:
- Is it okay to clear fsclassloader before a restart.
- What are the possible triggers for the fsclassloader cache refresh?
- Does fsclassloader cause issues with deprecated JSP usages?
- Any further leads as to where can we look / deep dive further to narrow down the root cause.
Happy to share more data if required
Regards
Nikhil