Expand my Community achievements bar.

POST requests give connection time out error intermittently

Avatar

Level 5

We have a server which is serving content to users, we now have incorporated login- register - reset password functionalities in the existing site.

Most of the things releated to creation of new users, login work as OOTB i.e. geometrixx login,view/edit profile pages.

Everything was ok in our lower environments however now we have it on our higher env, 

Intermittently all POST requests i.e. creation of new user, reseti of password, edit of profile give me connection time out on screen.

I havent been able to identify any pattern till now, randomly some request go timed out.

What could be the best way to find out where the problem is?

The layers of server go like this:

akamai server-> webserver / dispatcher -> publish instance

16 Replies

Avatar

Level 9

Does request reach to publish instance?. You can check publish access log when time out request happens.

--

Jitendra

Avatar

Level 5

HI Jitendra/Kautuk ,

I got access to logs and checked. The request does reach the publish server, still I get connection time out every time the POST goes beyond 60 seconds.If the POST request completes in 59 seconds also it works fine, however if post spans more than 60 sec I get connection timed out on page.On refreshing things work fine.

Please let me know how do I come around it?

I see there is a configuration

Day CQSE HTTP Service has got

"Connection Timeout, Connection timeout in milliseconds. This property applies to both HTTP and HTTPS connections. Defaults to 60 seconds. "

https://docs.adobe.com/docs/en/cq/5-6-1/deploying/osgi_configuration_settings.html

Would increasing this defaulted 60 sec help?

Thanks in advance.

Regards,

asn

Avatar

Level 9

Well, All I can say, we should try it. The most important question is what kind of processing we are using which takes more than 60 secs?. Have you checked with someone from infra team?.

FYI: If you are using AEM 6.0/6.1, you have to config this default connection timeout in Apache Felix Http Jetty based service configuration.

---

Jitendra

Avatar

Level 5

Hi Kautuk/Jitendra,

Tried a lot of things.But still not able to pin point whats the main issue here.

The login functionality that we have is inherited from Geometrixx Login functionality.

In case of view profile and edit profile it is exactly same as Geometrixx site. We only have validations on form fields in servervalildataion.jsp.

All POST requests related to user profile actions take more than 40-50 seconds and some times more than 60 seconds.This behavior is not consistent, happens intermittently on production instance.

We have 300 users in production instance

Any pointers are welcome:)

Thanks in advance.

Regards,

Avatar

Level 9

Sorry to hear that. All I can say that it would be really hard to guess what is the real issue. Why don't you try raising a day care ticket?.

---

Jitendra

Avatar

Level 5

Hi Scott,

No We havent yet, its production server so cant change anything untill have a clear RCA.

Interestingly the connection time out does not appear when I directly access publish server (without the domain name or via web server).

But in any case 40-50 seconds is a long time.

Same happens in case of j_security_check as well.

Could it be the case of /home/user path getting locked in case of concurrent POST requests? Because POSTs to other paths seem to be working fine.

Thanks in advance for you help.

 

Regards

Avatar

Level 10

I recommend try to change this setting when you get permission. If that does not help, open a ticket, Support can go through a database of issues and see if this is a known one. 

Avatar

Level 5

HI All,

Could get my hands on the server had been trying lot of things to zero down.(Also have daycare ticket raised, not getting much response till now)

Looks like all POST requests get stuck somewhere sometimes.

When I go to site admin and change title of pages/ make changes to page properties(write to crx) , it sometimes takes 120 seconds or so.

Could it be because of default bundle cache in repository.xml?

We don't have bundle cache section in repository.xml, so it will be defaulted to 8MB.

 

Thanks in advance for all help:)

Regards,

Avatar

Level 4

Hi ,

I am facing the same issue currently. I am using CQ5.6.1 .

That is when accessing the login using publish directly the login works without any issues but when login using dispacther it shows j_reason=session_timed_out in the url

sometimes not sure why this is happening when using dispacther.

 

I am using a similar implementation as mentioned on 

http://sling.apache.org/documentation/the-sling-engine/authentication/authentication-authenticationh...

I have implemented using 

 

public class RememberAuthenticationHandler implements AuthenticationHandler, AuthenticationFeedbackHandler{}

Here it logs in method for extractCredentials based on cookie. with AuthenticationInfo credentials = null;

credentials.put("$$sling.auth.AuthenticationFeedbackHandler$$", this);

 

When login failed sometimes it is getting hit to authenticationFailed. Not sure why this is happening only on dispacther

 

public void authenticationFailed(HttpServletRequest request, HttpServletResponse response,
            AuthenticationInfo authInfo) {
        LOG.info("Authentication failed");
        removeRememberMeCookie(request, response);
        if (authenticationFeedbackHandler != null) {
            authenticationFeedbackHandler.authenticationFailed(request, response, authInfo);
        }
    }

I have also set the below in dispatcher.any

 

# Hostname globbing for farm selection (virtual domain addressing)
    /virtualhosts
      {
        "www.abc.com/products/abc/*"
        
}
#CRQ000001297712 - Somos Online User Guide
     /sessionmanagement
       {
        /directory "/appl/webcache/session"
        /header "Cookie:login-token"
        /header "Cookie:remember-me"
        /timeout "300"
       }
    # The load will be balanced among these render instances
    /renders
      {
        /rend01
        {
        # Hostname or IP of the render
        /hostname "abc1.com"
        # Port of the render
        /port "4503"
        # Connect timeout in milliseconds, 0 to wait indefinitely
        /timeout "6000"
        }
      /rend02
       {
       /hostname "abc2.com"
       /port "4503"
       /timeout "6000"
      }

 

 

Could you please suggest as how you solved the problem

 

Thanks

Avatar

Employee Advisor

Hi,

In your case you need to systematically test each layer if the request arrives in time and if the response goes out (and how many time it took in between); this is probably time-consuming, but everything else is just guesswork. The request.log of AEM is quite helpful, because it logs per request 2 lines: one when the request is received and one when the response is sent.

kind regards,
Jörg

Avatar

Level 7

Why these simple things are taking 120 seconds? Have you checked the performance with the infra team? Please also check if there is any indexing needed for any heavyweight query that might be running.

Most importantly if this is solved mark it as solved and also share the solution for the community.

 

Thanks

Tuhin

Avatar

Level 5

@srinivasc11017710 @Tuhin,

we have identified that

One of the possible causes could be Bundlecachesize 

  1. Checked repository.xml on  publish

There is no bundlecache given explicitly.so it is defaulted to 8MB

Its recommended that it is at least set up to 256MB. It should be in proportion to the xmx size given to JVM.

https://helpx.adobe.com/experience-manager/kb/performancetuningtips.html#TIP01

In our case its way below the recommended size.

Infra team is going to change the bundlecache to see if that helps.

srinivasc11017710  see if all POST requests show such behavior , for ex. you could try changing the page title of a page in siteadmin multiple times to establish the same.In your case count the exact time after which request goes timeout, IE developer tool shows that clearly.

Avatar

Level 7

asn_177 wrote...

@srinivasc11017710 @Tuhin,

we have identified that

One of the possible causes could be Bundlecachesize 

  1. Checked repository.xml on  publish

There is no bundlecache given explicitly.so it is defaulted to 8MB

Its recommended that it is at least set up to 256MB. It should be in proportion to the xmx size given to JVM.

https://helpx.adobe.com/experience-manager/kb/performancetuningtips.html#TIP01

In our case its way below the recommended size.

Infra team is going to change the bundlecache to see if that helps.

srinivasc11017710  see if all POST requests show such behavior , for ex. you could try changing the page title of a page in siteadmin multiple times to establish the same.In your case count the exact time after which request goes timeout, IE developer tool shows that clearly.

 

 

Cheers. Then kindly mark it as fixed.

Thanks

Tuhin

Avatar

Administrator

Hi 

I would say you need to debug why it is taking so much of the time, some performance changes would be required.

Changing Connection Timeout is a workaround. And i am not in favor of it. No body would wait for 1 min to get into a page.

I would say you should try to debug the issue, if not able to undo this behavior then reach out to Support :- https://daycare.day.com/public/contact.html

Thanks and Regards

Kautuk Sahni



Kautuk Sahni