Hi Team,
The issue we are facing is google crawl is unable to index our site post login pages. Does anyone else faced this issue? I could not find anything concrete so far to fix this. Any pointers appreciated
We have configured sitemap for our site. This contain all post logging in pages. Looks like when google is trying to crawl to post login pages, our dispatcher settings redirect to login page (since its a post login page), hence google never index the post login pages.
Solved! Go to Solution.
Topics help categorize Community content and increase your ability to discover relevant content.
Views
Replies
Total Likes
@manisha594391 I checked with team who were were planning to crawl gated content. @BrianKasingli is absolutely correct, there is no way we can currently crawl gated content pages. They created some users and API at our end to help them authenticate and asked them to pass their requests with some headers to us , which will help us identify them. But they didn't had that capability.
Note: As suggested earlier, we can use third Party Crawling Engine which we can use to pass any authentication to get crawl and this is how they resolved this issue.
Special thanks to @VeenaVikraman for quick response.
@manisha594391 Page should be publicly available to get crawl. If you are hitting dispatcher URL and getting redirect to login page.. this means these pages will not get crawl and index.
But, why do we want to index such pages which require authentication on first place, should not be a case.
Below are the bare minimum for any page to get crawl:
Accessible Website: The website must be accessible to Google's web crawlers (also known as Googlebot). This means the website should not block Googlebot's access using robots.txt or other methods.
XML Sitemap: Providing an XML sitemap helps Google discover and crawl pages more efficiently, especially for large or complex websites.
HTTPS Security: Google prioritizes secure websites (those using HTTPS) in search results. Implementing HTTPS encryption can positively impact crawling and indexing.
Robots.txt: While not always necessary, a properly configured robots.txt file can guide Googlebot on which parts of the site to crawl and which to ignore
Pages must not have 302 redirect... though 301 redirect will work.
Thanks @Imran__Khan for your input.
Our site is migrated from magento and there google was able to index the post login pages, however we are facing challenges in implementing the same in AEM.
Views
Replies
Total Likes
Views
Replies
Total Likes
Google typically indexes web pages that are accessible to its crawlers. However, for pages that require authorization (such as login pages or those behind a paywall), Google's crawlers cannot access the content in the same way they do for public pages. To index content from pages that require authorization, website owners must provide Google with an alternative means of accessing this content.
hi @BrianKasingli , thanks for the reply ! Could you please point me to any examples or links for alternative means for google crawlers for post logged in content
Views
Replies
Total Likes
I don't think there's a way to expose authenticated pages to search index and crawlers; however, if your content is not sensitive, what you can do is make all those pages publically accessible, and then add JavaScript to hide the content when the users are not logged in; however, this method still posts privacy issues, where people can read the content if they understand how to manipulate the javascript. Or.. you can try something called Closed User Group(CUG) in AEM.
@manisha594391 did you try this ??
Please follow below link to resolve this iss
I hope it will help you !!!
@VeenaVikraman saw above post from you, can you please provide your valuable inputs here
Views
Replies
Total Likes
hi @Imran__Khan , I looked into this post, but it does not have much clarity on what need to be done. Its quite generic post I believe
Views
Replies
Total Likes
@manisha594391 Agree!!!
According to multie blogs, gated content pages can be crawl using tools as mentioned in blog.
https://www.google.com/amp/s/www.thinkific.com/blog/gated-content-strategy/amp/
Stay tuned, let me get more insight on this.
@manisha594391 I checked with team who were were planning to crawl gated content. @BrianKasingli is absolutely correct, there is no way we can currently crawl gated content pages. They created some users and API at our end to help them authenticate and asked them to pass their requests with some headers to us , which will help us identify them. But they didn't had that capability.
Note: As suggested earlier, we can use third Party Crawling Engine which we can use to pass any authentication to get crawl and this is how they resolved this issue.
Special thanks to @VeenaVikraman for quick response.
Thanks @Imran__Khan for looking into this. Could you please point me to any adobe article which may add some stats to confirm this.
Also, I am still investigating on other third party tools. I will let you know if I get to know anything concrete on that area.
Views
Replies
Total Likes
@manisha594391 @There is no official note on this from Adobe and that's why we have tools available in market to crawl gated content. If require, you can open adobe ticket and they will provide there response on the same as not supported.
There is nothing related to crawl gated content on Adobe official website. If require, you can check below link:
https://experienceleague.adobe.com/search.html#q=Crawl%20aem&sort=relevancy
Views
Likes
Replies
Views
Likes
Replies