Expand my Community achievements bar.

SOLVED

fetching/crawling data from AEM

Avatar

Level 4

Hi team, 

we have a requirement where we need to fetch/crawl entire data from AEM and ingest into our project.

what should be the best approach for this and also can you share some useful links/videos

 

Thank you,

Sriram

 

1 Accepted Solution

Avatar

Correct answer by
Level 7

Hi @sriram_1 , As you did not mention, what are you trying to achieve using this data. If you are trying to achieve search, Probabilty you should go for third party search like solar search. But for sake of answer. 
There are two ways you can do it. 

1. Iterate pages/assets/users and prepared result. 

2. Use query either query builder to SQL2. Write service to execute query in service code and get result. 


I am sharing example of both.  But first you have to get Resource resolver as below. 

https://experienceleaguecommunities.adobe.com/t5/adobe-experience-manager/how-to-initialize-resource...

1. Giving some example code for iteration. Make sure your Resource Resolver object has proper permission to access required data/content/pages/users
Getting pages

 Page page = resourceResolver.adaptTo(PageManager.class).getPage("/content");
            Iterator<Page> childPages = page.listChildren(null,true);
            while (childPages.hasNext()) {
                Page childPage = childPages.next();
           }

Getting User and Groups. Printing in logs. You can use as per your need.

            ResourceResolver resourceResolver = ResolverUtil.newResolver(resourceResolverFactory);
            Session session = resourceResolver.adaptTo(Session.class);
            UserManager userManager = ((JackrabbitSession) session).getUserManager();
            Iterator<Authorizable> userIterator = userManager.findAuthorizables("jcr:primaryType", "rep:User");
            LOG.info("\n ----------GETTING USERS-------------");
            while (userIterator.hasNext()) {
                Authorizable user = userIterator.next();
                    LOG.info("\n User : {}", user.getPath());
            }
            Iterator<Authorizable> systemUserIterator = userManager.findAuthorizables("jcr:primaryType", "rep:SystemUser");
            LOG.info("\n ----------GETTING System USERS-------------");
            while (systemUserIterator.hasNext()) {
                Authorizable serviceUser = systemUserIterator.next();
                LOG.info("\n Service User : {}", serviceUser.getPath());
            }

            Iterator<Authorizable> groupIterator = userManager.findAuthorizables("jcr:primaryType", "rep:Group");
            LOG.info("\n ----------GETTING Groups-------------");
            while (groupIterator.hasNext()) {
                Authorizable group = groupIterator.next();
                LOG.info("\n Group : {}", group.getPath());
            }

2. Sharing some sample queries and code implementations. 
Query Builder query to get page and assets. I am sharing simplest one. create as per your need. 

/* ---To get Assets----*/
path=/content/dam
type=dam:Asset
p.limit=-1

/* ---To get Pages----*/
/* ---Adjust type as per your content----*/
path=/content
type=cq:PageContent
p.limit=-1

How to implement in backend 

@Reference
QueryBuilder queryBuilder;
       
 Map<String,String> queryMap=new HashMap<>();
   queryMap.put("path","/content/dam/we-retail");
   queryMap.put("type","dam:Asset");
   queryMap.put("p.limit",Long.toString(-1));
   final Session session = resourceResolver.adaptTo(Session.class);
   Query query = queryBuilder.createQuery(PredicateGroup.create(queryMap), session);
    SearchResult result = query.getResult();
    int perPageResults = result.getHits().size();
    long totalResults = result.getTotalMatches();
     List<Hit> hits =result.getHits();
        for(Hit hit: hits){
            Asset asset=hit.getResource().adaptTo(Asset.class);
           LOG.info("\n Page {} ",asset.getPath());
        }

In Case you use SQL 2

            String searchPath="/content/we-retail";
            String sql2Query = "SELECT * FROM [cq:PageContent] AS node WHERE ISDESCENDANTNODE ("+searchPath+") ORDER BY node.[jcr:title]";
            ResourceResolver resourceResolver = ResolverUtil.newResolver(resourceResolverFactory);
            final Session session = resourceResolver.adaptTo(Session.class);
            final javax.jcr.query.Query query = session.getWorkspace().getQueryManager().createQuery(sql2Query,javax.jcr.query.Query.JCR_SQL2);
            final QueryResult result = query.execute();
            NodeIterator pages=result.getNodes();
            JSONArray resultArray=new JSONArray();
            while(pages.hasNext()){
                Node page=pages.nextNode();
            }

These are just sample codes. Get Resource Resolver with proper permissions.

 

View solution in original post

8 Replies

Avatar

Community Advisor

Hi @sriram_1 ,

Would be really appreciated if you elaborate what kind of data and for what purpose to understand more.

Regards,

Santosh 

Avatar

Level 4

Hi @SantoshSai 

 

kind of data: users, groups, sites, assets, forms, screens....

 

ingest data from AEM into our project for search optimization

 

Thanks

Sriram

Avatar

Community Advisor

@sriram_1 

Usually in AEM we don't share data related to users, groups, etc

However If you wish to expose those data there will be HTTP API you can refer.

In terms of search optimization here are few links from Adobe as well as few from third party

To ensure the crawlers are crawling our website, we need to have sitemap.xml and a robots.txt which redirects the crawler to corresponding sitemap.xml Please refer Robot.txt

Sitemap Generator: https://adobe-consulting-services.github.io/acs-aem-commons/features/sitemap/index.html

Avatar

Level 2

@sriram_1 

Usually web search will be implemented by crawling or predefined index files(connectors) for the website pages/forms/assets or any third party system like service now,Salesforce etc.. for end user using search engines like Solr,LucidWorks Fusion,Atveo,Coveo etc..

I am not clear why

a) you want to search across users and groups(Expose your users and groups to end users).

b) Is your site provides authenticated experience?

 

Please clarify.

 

Regards,

Rajashankar.R

Avatar

Level 4

Which search engine are you trying to use ?

Avatar

Correct answer by
Level 7

Hi @sriram_1 , As you did not mention, what are you trying to achieve using this data. If you are trying to achieve search, Probabilty you should go for third party search like solar search. But for sake of answer. 
There are two ways you can do it. 

1. Iterate pages/assets/users and prepared result. 

2. Use query either query builder to SQL2. Write service to execute query in service code and get result. 


I am sharing example of both.  But first you have to get Resource resolver as below. 

https://experienceleaguecommunities.adobe.com/t5/adobe-experience-manager/how-to-initialize-resource...

1. Giving some example code for iteration. Make sure your Resource Resolver object has proper permission to access required data/content/pages/users
Getting pages

 Page page = resourceResolver.adaptTo(PageManager.class).getPage("/content");
            Iterator<Page> childPages = page.listChildren(null,true);
            while (childPages.hasNext()) {
                Page childPage = childPages.next();
           }

Getting User and Groups. Printing in logs. You can use as per your need.

            ResourceResolver resourceResolver = ResolverUtil.newResolver(resourceResolverFactory);
            Session session = resourceResolver.adaptTo(Session.class);
            UserManager userManager = ((JackrabbitSession) session).getUserManager();
            Iterator<Authorizable> userIterator = userManager.findAuthorizables("jcr:primaryType", "rep:User");
            LOG.info("\n ----------GETTING USERS-------------");
            while (userIterator.hasNext()) {
                Authorizable user = userIterator.next();
                    LOG.info("\n User : {}", user.getPath());
            }
            Iterator<Authorizable> systemUserIterator = userManager.findAuthorizables("jcr:primaryType", "rep:SystemUser");
            LOG.info("\n ----------GETTING System USERS-------------");
            while (systemUserIterator.hasNext()) {
                Authorizable serviceUser = systemUserIterator.next();
                LOG.info("\n Service User : {}", serviceUser.getPath());
            }

            Iterator<Authorizable> groupIterator = userManager.findAuthorizables("jcr:primaryType", "rep:Group");
            LOG.info("\n ----------GETTING Groups-------------");
            while (groupIterator.hasNext()) {
                Authorizable group = groupIterator.next();
                LOG.info("\n Group : {}", group.getPath());
            }

2. Sharing some sample queries and code implementations. 
Query Builder query to get page and assets. I am sharing simplest one. create as per your need. 

/* ---To get Assets----*/
path=/content/dam
type=dam:Asset
p.limit=-1

/* ---To get Pages----*/
/* ---Adjust type as per your content----*/
path=/content
type=cq:PageContent
p.limit=-1

How to implement in backend 

@Reference
QueryBuilder queryBuilder;
       
 Map<String,String> queryMap=new HashMap<>();
   queryMap.put("path","/content/dam/we-retail");
   queryMap.put("type","dam:Asset");
   queryMap.put("p.limit",Long.toString(-1));
   final Session session = resourceResolver.adaptTo(Session.class);
   Query query = queryBuilder.createQuery(PredicateGroup.create(queryMap), session);
    SearchResult result = query.getResult();
    int perPageResults = result.getHits().size();
    long totalResults = result.getTotalMatches();
     List<Hit> hits =result.getHits();
        for(Hit hit: hits){
            Asset asset=hit.getResource().adaptTo(Asset.class);
           LOG.info("\n Page {} ",asset.getPath());
        }

In Case you use SQL 2

            String searchPath="/content/we-retail";
            String sql2Query = "SELECT * FROM [cq:PageContent] AS node WHERE ISDESCENDANTNODE ("+searchPath+") ORDER BY node.[jcr:title]";
            ResourceResolver resourceResolver = ResolverUtil.newResolver(resourceResolverFactory);
            final Session session = resourceResolver.adaptTo(Session.class);
            final javax.jcr.query.Query query = session.getWorkspace().getQueryManager().createQuery(sql2Query,javax.jcr.query.Query.JCR_SQL2);
            final QueryResult result = query.execute();
            NodeIterator pages=result.getNodes();
            JSONArray resultArray=new JSONArray();
            while(pages.hasNext()){
                Node page=pages.nextNode();
            }

These are just sample codes. Get Resource Resolver with proper permissions.

 

Avatar

Community Advisor

Looking into this I got one question @sunil_kumar_ - I believe, original request was about fetch/crawl AEM data, considering AEM best practices I don't think so we expose such data as you mentioned above to any search engine. Neither understood about concept of performing costly queries just to crawl. What I understood crawling - search engine optimization. correct me if I'm wrong.

Avatar

Level 7

@SantoshSai If you read, I mention, I am adding this code sample for sake of answer. To answer about exposing data. It's all depends on user's use case. If client need, we have to do. May be client trying to get for some internal portal. we don't know what is exact use case.  In one of the replay, user mention, he is looking for users, groups, site, assets etc.

In this question user mention entire data. So for sake providing all information I added this code.  I am trying to add information as much as possible. Now let user decide what he need. This might help others as well. 
We are here to help as much as possible. When requirements are not clear, which are not in most of the cases, We should try help them with maximum information.