Expand my Community achievements bar.

Guidelines for the Responsible Use of Generative AI in the Experience Cloud Community.
SOLVED

Using Query Builder queries to fetch required paths and automate package creation to move the content between cloud instances

Avatar

Level 4

Hi ,

 

So I am analysing a solution for a problem statement , where we have to move the content between two cloud instances in an automated way via packages. The requirement is lets say content author team changes x number of pages in span of 3 days so find those pages , package them and move to target instance. We want to achieve the selective sync.

 

The solution i am thinking is to run the below query , get the paths of the pages which are modified , package them and move them to target instance

path=/content/abc/mno/site
type=cq:Page
daterange.property=jcr:content/cq:lastModified
daterange.lowerBound=2024-07-22T00:00:00.000Z
daterange.lowerOperation=>=
daterange.upperBound=2024-07-24T00:00:00.000Z
daterange.upperOperation=<=
p.hits=selective
p.properties=jcr:path
p.limit=-1

 

Now below are my questions:

 

1. On which path level I should run this query as we have large number of pages lets say more than 5000 under parent path , 

2. if i have to run at the parent level and transverse the nodes , should i increase the node transversal limit or tweak the query further or tweak the index to handle this query , i do not want to get into 100,000 node traversal exception 

3. For complete automation what is the preferred way , i am thinking of MCP utility or CURL scripts via something like jenkins jobs. The objective here is to achieve maximum automation , we dont want authors to scratch their heads to provide the path to the tech team for packages and all.

 

Would appreciate your thoughts on this 

 

Thanks 

Topics

Topics help categorize Community content and increase your ability to discover relevant content.

1 Accepted Solution

Avatar

Correct answer by
Level 9

Yes, it is being picked. Verified both in the SDK which has the same number of pages as well as on the actual cloud instance. So, there might be some issue with indexes in your project. Check if you have any custom indexes in your project which might be conflicting.

View solution in original post

14 Replies

Avatar

Community Advisor

@tnik 

Have you tried exploring https://adobe-consulting-services.github.io/acs-aem-commons/features/contentsync/index.html for incremental updates?

I guess it should meet your requirements of incremental updates.

 

Few suggestions:

- Preferably create separate packages to deal with each site. 

- Always create backup of Target and then install the latest package

- Assure you have set up the package filters properly. If you need just a page, we would need to assure we do not impact the child pages.

 

 

We had create curl utility for the sync. We had hosted it as an Azure function.


Aanchal Sikka

Avatar

Level 4

Hi @aanchal-sikka ,

 

I tried the ACS commons content sync in details , so the problem with this is , lets say we have a site /content/mysite/en and it has some 1000 pages and we dont know how many pages under sites have been modified in last 10 days , so if we run this utility on /content/mysite/en , it simply hangs and fails to build catalogue and could not fetch the Delta. 

that is the reason I am thinking of using above  query at the root level (/content/mysite/en), but need to optimise it so it does not fail for more than 100, 000 nodes.

 

Regarding the packages approach ,

For sync , i want to package pages or child pages which are modified in defined time period so can that be possible via Curl or should i write a custom servlet which do all the query and package processing and we call it via curl

 

Also this process has to be completely automated as we have multiple sites to handle and do not want to handled the packages manually , as that will not be scalable and maintainable.

 

Thanks,

 

Avatar

Community Advisor

Hi @tnik 
You can use groovy to a recursive tree search to avoid node traveling issue, or you can look for Audit logs to identify page modification/creation.

https://adobe-consulting-services.github.io/acs-aem-commons/features/audit-log-search/index.html  



Arun Patidar

Avatar

Level 4

Hi @arunpatidar , 

Couple of quick points:

 

1.Can you please give an example of groovy script to identify the modified pages under the content tree

 

2. I thought of trying ACS Audit log search but it also failing as it also uses query eventually in the BE and also we do want to run the ACS commons  manually as eventually we want to build packages with the returned path so want to do it in automated manner as need to do for multiple sites

 

3. also for below query would it be recommended to update the existing index or create new for it to run faster OR increase the oakquery limit config to lets say 500, 000 instead of 100, 000

path=/content/abc/mno/site
type=cq:Page
daterange.property=jcr:content/cq:lastModified
daterange.lowerBound=2024-07-22T00:00:00.000Z
daterange.lowerOperation=>=
daterange.upperBound=2024-07-24T00:00:00.000Z
daterange.upperOperation=<=
p.hits=selective
p.properties=jcr:path
p.limit=-1

 

Thanks,

Avatar

Community Advisor

Hi @tnik 
Please find sample Groovy script, this is just for sample, might not work as is

 

 

// please check missing imports

path = "/content/abc/mno/site"
pageFound = 0
pageList=[]
lowerBound = Date.parse("yyyy-MM-dd'T'HH:mm:ss.SSSX", "2024-07-22T00:00:00.000Z")
upperBound = Date.parse("yyyy-MM-dd'T'HH:mm:ss.SSSX", "2024-07-24T00:00:00.000Z")



// please date compare logic here, this is just a psuedo code
getPage(path).recurse { page ->
    if(page.lastModified > lowerDate && page.lastModified < upperDate){
     pageList.add(page.path)
        pageFound++;
    }
    
}

print pageList;

def createPackage(List pageList) {
    JcrPackageManager packageManager = resolver.adaptTo(JcrPackageManager.class)

    JcrPackage jcrPackage = packageManager.create("my_packages", "mypackage", "1.0")
    def defNode = jcrPackage.getDefinition().getNode()
    def filterNode = defNode.addNode("filter", "vlt:PackageDefinition")
    filterNode.setProperty("root", path)

    pageList.each { page ->
        def pagePath = filterNode.addNode("filter", "vlt:Filter")
        pagePath.setProperty("root", page)
    }

    packageManager.assemble(jcrPackage)
    session.save()
}

createPackage(pageList)


 

,  



Arun Patidar

Avatar

Level 9

Have you considered using this https://adobe-consulting-services.github.io/acs-aem-commons/features/packagers/query-packager/index.... ? 
Since you already have a query, this will handle your package creation. 
And since you have already limited your query to cq:Page, I would assume you should not run into traversal exception (No harm in creating the index though) 
The only remaining thing would be on how you want to install this package which could be done in several ways depending on your DevOps setup. 

Avatar

Level 4

Hi @h_kataria ,

Yes I have explored this ACS AEM commons utility , it fails when I try to run the above query on the top most language node , it works fine for the children path. 

Avatar

Level 9

We have 6000+ pages and I ran your query for it and didn't find any issue. So, not sure why it would fail for a use case of 5000 pages. You can check if there is any other error which comes in the logs while you try to create the package.

Avatar

Level 4

Hi @h_kataria ,

So when i am running my query on parent level then its breaking with 100, 000 node traversal exception on both query builder and ACS AEM packager utility. Are you running it at root level /content/abc/site/en level of your site?

Avatar

Level 9

yes, I ran it at the root level. I first checked the total pages using the query which came out to be 6162  and then ran your query and got proper results back.
So, maybe you can cross check your total pages once. It is probably more than you estimated. 

Avatar

Level 4

Hey ,

I checked the query is not picking up any index , though my assumption was that it should pick cq:PageLucene index which is OOTB and it has the jcrcontent/cqlasmodified property. Any pointers around that , is this index picked in your env?

 

Thanks,

 

Avatar

Correct answer by
Level 9

Yes, it is being picked. Verified both in the SDK which has the same number of pages as well as on the actual cloud instance. So, there might be some issue with indexes in your project. Check if you have any custom indexes in your project which might be conflicting.

Avatar

Level 4

Hi @h_kataria ,

 

Could identify and fix the index issue , now the query is fetching the required modified pages when fired from root level. Thanks. Also ACS packager utility also started working as it was dependent on the query only. it could pick cqPageLucene OOTB index Thanks!!

Avatar

Level 1

@tnik 

I have analyzed the solution for the problem statement, which is that we need to move content between two cloud instances automatically via packages. The requirement is to find and package the pages changed by the content author team within a specified timeframe and move them to the target instance for selective sync. Here is my proposed solution.

Proposed Solution:

  1. References Handling: It is crucial to account for any references updated, such as images or experience fragments. Use the references.json API call (the one that happens during the publish event to track these updates)

  2. Automated Content Movement: I recommend building a standard utility to automate content movement with the following steps:

    1. Author Selection: The author selects the source and destination for content movement.

    2. Package Creation: Utilize jcrPackageManager APIs to create packages.

      • Run the query (consider if indexing can help or if project-wise queries are necessary).
      • Fetch updated references for each response path.
      • Create a list of paths to be packaged.
      • Generate a package for movement (assuming there aren't thousands of page updates in a day).
    3. Servlet for Package Transfer: Pass the package to a servlet on the target instance with a technical token, and use a Sling job to process and install it.

    4. Callback Servlet: Implement a callback servlet to notify the source instance about successful installation. If there are many paths, consider the timeout possibility, as HTTP requests can have long wait times.

    5. Author Notification: The callback at the source instance should trigger a notification to inform the author about the successful content movement.

      This might not be the complete solution, or how it should be done but leaves a sketch and highlights different roles and actors that needs to be sufficed to solve it in an automated way.

      Regards,
      Divanshu