Hi Team
So I recently raised a question https://experienceleaguecommunities.adobe.com/t5/adobe-experience-manager/how-can-i-enable-private-g... for an issue I was facing. I am rethinking the solution approach I was asked to follow. I would need some expert opinions on the best approach for this issue.
Problem:- We have a third-party search engine API that needs to crawl all our pages and index them. We have integrated SAML Authentication. Some of the pages will have authentication enabled. We need the Search Engine to crawl these pages too.
Solution Suggested:- The solution which I was asked to implement is as below
My thought process was to write a Filter and do the same. But, some SMEs raised a point to me that, if someone knows the User-Agent for this Crawling API, just relying on the User-Agent can be a potential issue.
I am looking for the right or better option for this problem. I am open to all suggestions.
I have a few suggestions from some SMEs which I am looking at currently
1. Sandeep has suggested one way which I think should be one good way to achieve this. Read here
2. Also checking if this is possible https://stackoverflow.com/a/1382668/8671041
Thanks
Veena ✌
Solved! Go to Solution.
Views
Replies
Total Likes
Hi @VeenaVikraman ,
IP whitelisting along with user authorization would help to achieve the same.
1. Create specific user with required access rights for the third party.
2. Whitelist all IP range for the third party application requests.
3. Third party application will send requests along with user authorization.
4. Based on the IP & user authorization, request will process.
Hello @VeenaVikraman
I am not an AEM expert, So I don't know how this can be implemented in AEM.
But you already have a solution, Instead of checking the user-agent check the IP of the visitor/crawler. Get a range of IPs from the third-party crawler and add them to an approved list. If a request is coming from those IPs then bypass authentication else asks for authentication.
we have achieve that other way around.
we had a solution to replicate/post page data to serach engine(Solr) with replication.
So we had no crawling but the data is updating with each page publication.
we had a UI to do bulk publishing to Solr as well.
I am not sure, if this is possible for you to do it, but another solution can be to use basic authentication for crawler to bypass filter.
Thanks @arunpatidar . We don't need to submit data to the search engine. We need to allow them to crawl all our pages, irrespective of whether it is public or CUG enabled.
Hi @VeenaVikraman ,
IP whitelisting along with user authorization would help to achieve the same.
1. Create specific user with required access rights for the third party.
2. Whitelist all IP range for the third party application requests.
3. Third party application will send requests along with user authorization.
4. Based on the IP & user authorization, request will process.