Expand my Community achievements bar.

SOLVED

AEM Cloud Service Sites: Intermittent 403 in Auth Checker Servlet in Production

Avatar

Level 2

Dear Members,

 

We are experiencing an intermittent issue with our production publisher. We have enabled Closed User Group (CUG) and authentication for certain form pages. When a restricted page is accessed, the user is redirected to the SSO login page. After logging in, the SAML login call is successful, and the user is redirected to the requested page. However, in the Auth Checker servlet, we occasionally observe that the request session has read access, while at other times it has no access. In case of 403, user is shown 404 even if user has access to the particular resource.

 

Does anyone have any insights into why this is happening?

 

I am attaching the access log, where it can be observed that the requests are going to the same publish POD. However, the response codes vary, with some being 200 and others being 403.

 

 

KartikKarnayilDC_0-1718341922097.png

 

1 Accepted Solution

Avatar

Correct answer by
Community Advisor

Hi @KartikKarnayilDC 
Are76b5d9999b-xlh5r and 76b5d9999b-88hcf are the same publish instances? Can you check the user profile what do you get in terms of SAML groups?



Arun Patidar

View solution in original post

11 Replies

Avatar

Community Advisor

Hi @KartikKarnayilDC 
Can you check the login-token cookie from the request object in your servlet?

Please check for both 403 and 200 responses and compare.

If there is a AEM session cookie then it must be working.

what if you skip, dispatcher, is it working all the time? 



Arun Patidar

Avatar

Level 2

Thanks Arun will check from java side. Had checked the request headers from browser and it had login-token in the cookie.

Avatar

Community Advisor

Hi @KartikKarnayilDC 
If login-token is present and valid then you should be able to authenticated. The only thing I can suspect is multistep publisher with sticky session.



Arun Patidar

Avatar

Level 2

Hi @arunpatidar , After checking logs and checking repo browser. We found few new things.

 

17.06.2024 00:38:13.428 [cm-p110704-e1133840-aem-publish-76b5d9999b-xlh5r] *INFO* [qtp1682888838-1762] com.adobe.granite.security.user.internal.audit.AuditGroupAction User 'kkar882' was added to the group 'Contractor'
17.06.2024 00:41:08.309 [cm-p110704-e1133840-aem-publish-76b5d9999b-88hcf] *INFO* [qtp755966662-2099] com.adobe.granite.security.user.internal.audit.AuditGroupAction User 'kkar882' was removed from the group 'Contractor'

 

 

Also rep:cache node didn't have all groups principals as expected, whenever the issue reproduced.

 

Any idea why would the user removal happen? 

Contractor group is coming as part of SAML Response.
We have a cug-staff-group which is applied at resource level and Contractor group is added as a member of cug-staff-group.

 

Avatar

Correct answer by
Community Advisor

Hi @KartikKarnayilDC 
Are76b5d9999b-xlh5r and 76b5d9999b-88hcf are the same publish instances? Can you check the user profile what do you get in terms of SAML groups?



Arun Patidar

Avatar

Community Advisor

@KartikKarnayilDC 

 

The logs are blurry. Unable to read.

 

1. Please assure the secured content is not cached on CDN (covered in Steps 5-7 on blog)

2. Enable DEBUG logs, it will help you understand if caching and auth checker are working as expected.


Aanchal Sikka

Avatar

Level 2

Hi Anchal,

 

Yes, the apache response headers are set to private for restricted path and are not cached in CDN. We can see cache = MISS in response header.

 

Also, this issue is only reproducible in prod env. We are good with functionality in lower environments(stage,test)

 

Regarding blurry logs, you should be able to zoom, which should be clear.

Avatar

Community Advisor

Hi @KartikKarnayilDC ,

This issue could be caused by a number of factors, including caching issues, incorrect configuration of the Auth Checker servlet, or issues with the SSO login process. Here are a few steps you can take to troubleshoot the issue:

1. Check the configuration of the Auth Checker servlet to ensure that it is correctly configured to check for the appropriate permissions. You may want to review the documentation for the Auth Checker servlet to ensure that it is set up correctly.

2. Check the caching settings for the pages that are experiencing the issue. If the pages are being cached, it is possible that the cached version of the page is being served to users who do not have access, resulting in a 403 error. You may want to adjust the caching settings to ensure that the pages are not being cached for too long.

3. Check the SSO login process to ensure that it is functioning correctly. If there are issues with the SSO login process, it is possible that users are not being authenticated correctly, resulting in a 403 error.

4. Review the access logs to see if there are any patterns or trends that could help identify the root cause of the issue. You may want to look for any patterns in the times of day when the issue occurs, or any patterns in the types of requests that are resulting in a 403 error.

If none of these steps resolve the issue, you may want to consider reaching out to Adobe Support for further assistance.

Avatar

Administrator

@KartikKarnayilDC Did you find the suggestions from users helpful? Please let us know if you require more information. Otherwise, please mark the answer as correct for posterity. If you've discovered a solution yourself, we would appreciate it if you could share it with the community. Thank you!



Kautuk Sahni

Avatar

Level 2

Hi @kautuk_sahni 
Issue is still not resolved; The engineering team is looking into the issue. I will update the group once it's resolved.

Avatar

Level 2

Team,

 

We've identified the root cause of why the user was being removed from the group and experiencing a Forbidden error.

 

The issue lies with the IDP SAML Response. As the SAML response was encrypted we had not checked it. But after disabling the encryption we saw the group issue. The customer's IDP intermittently failed to send group information correctly. Sometimes the group information was included in the SAML response, while other times it was missing. When the group information is not sent, the user gets removed from the group.

 

Thank you @arunpatidar , @aanchal-sikka , @HrishikeshKa  for your inputs.