Hi team,
we have a requirement where we need to fetch/crawl entire data from AEM and ingest into our project.
what should be the best approach for this and also can you share some useful links/videos
Thank you,
Sriram
Solved! Go to Solution.
Views
Replies
Total Likes
Hi @sriram_1 , As you did not mention, what are you trying to achieve using this data. If you are trying to achieve search, Probabilty you should go for third party search like solar search. But for sake of answer.
There are two ways you can do it.
1. Iterate pages/assets/users and prepared result.
2. Use query either query builder to SQL2. Write service to execute query in service code and get result.
I am sharing example of both. But first you have to get Resource resolver as below.
1. Giving some example code for iteration. Make sure your Resource Resolver object has proper permission to access required data/content/pages/users
Getting pages
Page page = resourceResolver.adaptTo(PageManager.class).getPage("/content"); Iterator<Page> childPages = page.listChildren(null,true); while (childPages.hasNext()) { Page childPage = childPages.next(); }
Getting User and Groups. Printing in logs. You can use as per your need.
ResourceResolver resourceResolver = ResolverUtil.newResolver(resourceResolverFactory); Session session = resourceResolver.adaptTo(Session.class); UserManager userManager = ((JackrabbitSession) session).getUserManager(); Iterator<Authorizable> userIterator = userManager.findAuthorizables("jcr:primaryType", "rep:User"); LOG.info("\n ----------GETTING USERS-------------"); while (userIterator.hasNext()) { Authorizable user = userIterator.next(); LOG.info("\n User : {}", user.getPath()); } Iterator<Authorizable> systemUserIterator = userManager.findAuthorizables("jcr:primaryType", "rep:SystemUser"); LOG.info("\n ----------GETTING System USERS-------------"); while (systemUserIterator.hasNext()) { Authorizable serviceUser = systemUserIterator.next(); LOG.info("\n Service User : {}", serviceUser.getPath()); } Iterator<Authorizable> groupIterator = userManager.findAuthorizables("jcr:primaryType", "rep:Group"); LOG.info("\n ----------GETTING Groups-------------"); while (groupIterator.hasNext()) { Authorizable group = groupIterator.next(); LOG.info("\n Group : {}", group.getPath()); }
2. Sharing some sample queries and code implementations.
Query Builder query to get page and assets. I am sharing simplest one. create as per your need.
/* ---To get Assets----*/ path=/content/dam type=dam:Asset p.limit=-1 /* ---To get Pages----*/ /* ---Adjust type as per your content----*/ path=/content type=cq:PageContent p.limit=-1
How to implement in backend
@Reference QueryBuilder queryBuilder; Map<String,String> queryMap=new HashMap<>(); queryMap.put("path","/content/dam/we-retail"); queryMap.put("type","dam:Asset"); queryMap.put("p.limit",Long.toString(-1)); final Session session = resourceResolver.adaptTo(Session.class); Query query = queryBuilder.createQuery(PredicateGroup.create(queryMap), session); SearchResult result = query.getResult(); int perPageResults = result.getHits().size(); long totalResults = result.getTotalMatches(); List<Hit> hits =result.getHits(); for(Hit hit: hits){ Asset asset=hit.getResource().adaptTo(Asset.class); LOG.info("\n Page {} ",asset.getPath()); }
In Case you use SQL 2
String searchPath="/content/we-retail"; String sql2Query = "SELECT * FROM [cq:PageContent] AS node WHERE ISDESCENDANTNODE ("+searchPath+") ORDER BY node.[jcr:title]"; ResourceResolver resourceResolver = ResolverUtil.newResolver(resourceResolverFactory); final Session session = resourceResolver.adaptTo(Session.class); final javax.jcr.query.Query query = session.getWorkspace().getQueryManager().createQuery(sql2Query,javax.jcr.query.Query.JCR_SQL2); final QueryResult result = query.execute(); NodeIterator pages=result.getNodes(); JSONArray resultArray=new JSONArray(); while(pages.hasNext()){ Node page=pages.nextNode(); }
These are just sample codes. Get Resource Resolver with proper permissions.
Hi @SantoshSai
kind of data: users, groups, sites, assets, forms, screens....
ingest data from AEM into our project for search optimization
Thanks
Sriram
Usually in AEM we don't share data related to users, groups, etc
However If you wish to expose those data there will be HTTP API you can refer.
In terms of search optimization here are few links from Adobe as well as few from third party
To ensure the crawlers are crawling our website, we need to have sitemap.xml and a robots.txt which redirects the crawler to corresponding sitemap.xml Please refer Robot.txt
Sitemap Generator: https://adobe-consulting-services.github.io/acs-aem-commons/features/sitemap/index.html
Usually web search will be implemented by crawling or predefined index files(connectors) for the website pages/forms/assets or any third party system like service now,Salesforce etc.. for end user using search engines like Solr,LucidWorks Fusion,Atveo,Coveo etc..
I am not clear why
a) you want to search across users and groups(Expose your users and groups to end users).
b) Is your site provides authenticated experience?
Please clarify.
Regards,
Rajashankar.R
Which search engine are you trying to use ?
Hi @sriram_1 , As you did not mention, what are you trying to achieve using this data. If you are trying to achieve search, Probabilty you should go for third party search like solar search. But for sake of answer.
There are two ways you can do it.
1. Iterate pages/assets/users and prepared result.
2. Use query either query builder to SQL2. Write service to execute query in service code and get result.
I am sharing example of both. But first you have to get Resource resolver as below.
1. Giving some example code for iteration. Make sure your Resource Resolver object has proper permission to access required data/content/pages/users
Getting pages
Page page = resourceResolver.adaptTo(PageManager.class).getPage("/content"); Iterator<Page> childPages = page.listChildren(null,true); while (childPages.hasNext()) { Page childPage = childPages.next(); }
Getting User and Groups. Printing in logs. You can use as per your need.
ResourceResolver resourceResolver = ResolverUtil.newResolver(resourceResolverFactory); Session session = resourceResolver.adaptTo(Session.class); UserManager userManager = ((JackrabbitSession) session).getUserManager(); Iterator<Authorizable> userIterator = userManager.findAuthorizables("jcr:primaryType", "rep:User"); LOG.info("\n ----------GETTING USERS-------------"); while (userIterator.hasNext()) { Authorizable user = userIterator.next(); LOG.info("\n User : {}", user.getPath()); } Iterator<Authorizable> systemUserIterator = userManager.findAuthorizables("jcr:primaryType", "rep:SystemUser"); LOG.info("\n ----------GETTING System USERS-------------"); while (systemUserIterator.hasNext()) { Authorizable serviceUser = systemUserIterator.next(); LOG.info("\n Service User : {}", serviceUser.getPath()); } Iterator<Authorizable> groupIterator = userManager.findAuthorizables("jcr:primaryType", "rep:Group"); LOG.info("\n ----------GETTING Groups-------------"); while (groupIterator.hasNext()) { Authorizable group = groupIterator.next(); LOG.info("\n Group : {}", group.getPath()); }
2. Sharing some sample queries and code implementations.
Query Builder query to get page and assets. I am sharing simplest one. create as per your need.
/* ---To get Assets----*/ path=/content/dam type=dam:Asset p.limit=-1 /* ---To get Pages----*/ /* ---Adjust type as per your content----*/ path=/content type=cq:PageContent p.limit=-1
How to implement in backend
@Reference QueryBuilder queryBuilder; Map<String,String> queryMap=new HashMap<>(); queryMap.put("path","/content/dam/we-retail"); queryMap.put("type","dam:Asset"); queryMap.put("p.limit",Long.toString(-1)); final Session session = resourceResolver.adaptTo(Session.class); Query query = queryBuilder.createQuery(PredicateGroup.create(queryMap), session); SearchResult result = query.getResult(); int perPageResults = result.getHits().size(); long totalResults = result.getTotalMatches(); List<Hit> hits =result.getHits(); for(Hit hit: hits){ Asset asset=hit.getResource().adaptTo(Asset.class); LOG.info("\n Page {} ",asset.getPath()); }
In Case you use SQL 2
String searchPath="/content/we-retail"; String sql2Query = "SELECT * FROM [cq:PageContent] AS node WHERE ISDESCENDANTNODE ("+searchPath+") ORDER BY node.[jcr:title]"; ResourceResolver resourceResolver = ResolverUtil.newResolver(resourceResolverFactory); final Session session = resourceResolver.adaptTo(Session.class); final javax.jcr.query.Query query = session.getWorkspace().getQueryManager().createQuery(sql2Query,javax.jcr.query.Query.JCR_SQL2); final QueryResult result = query.execute(); NodeIterator pages=result.getNodes(); JSONArray resultArray=new JSONArray(); while(pages.hasNext()){ Node page=pages.nextNode(); }
These are just sample codes. Get Resource Resolver with proper permissions.
Looking into this I got one question @sunil_kumar_ - I believe, original request was about fetch/crawl AEM data, considering AEM best practices I don't think so we expose such data as you mentioned above to any search engine. Neither understood about concept of performing costly queries just to crawl. What I understood crawling - search engine optimization. correct me if I'm wrong.
@SantoshSai If you read, I mention, I am adding this code sample for sake of answer. To answer about exposing data. It's all depends on user's use case. If client need, we have to do. May be client trying to get for some internal portal. we don't know what is exact use case. In one of the replay, user mention, he is looking for users, groups, site, assets etc.
In this question user mention entire data. So for sake providing all information I added this code. I am trying to add information as much as possible. Now let user decide what he need. This might help others as well.
We are here to help as much as possible. When requirements are not clear, which are not in most of the cases, We should try help them with maximum information.
Views
Likes
Replies