Using Query Builder queries to fetch required paths and automate package creation to move the content between cloud instances | Community
Skip to main content
Level 3
July 23, 2024
Solved

Using Query Builder queries to fetch required paths and automate package creation to move the content between cloud instances

  • July 23, 2024
  • 3 replies
  • 2912 views

Hi ,

 

So I am analysing a solution for a problem statement , where we have to move the content between two cloud instances in an automated way via packages. The requirement is lets say content author team changes x number of pages in span of 3 days so find those pages , package them and move to target instance. We want to achieve the selective sync.

 

The solution i am thinking is to run the below query , get the paths of the pages which are modified , package them and move them to target instance

path=/content/abc/mno/site
type=cq:Page
daterange.property=jcr:content/cq:lastModified
daterange.lowerBound=2024-07-22T00:00:00.000Z
daterange.lowerOperation=>=
daterange.upperBound=2024-07-24T00:00:00.000Z
daterange.upperOperation=<=
p.hits=selective
p.properties=jcr:path
p.limit=-1

 

Now below are my questions:

 

1. On which path level I should run this query as we have large number of pages lets say more than 5000 under parent path , 

2. if i have to run at the parent level and transverse the nodes , should i increase the node transversal limit or tweak the query further or tweak the index to handle this query , i do not want to get into 100,000 node traversal exception 

3. For complete automation what is the preferred way , i am thinking of MCP utility or CURL scripts via something like jenkins jobs. The objective here is to achieve maximum automation , we dont want authors to scratch their heads to provide the path to the tech team for packages and all.

 

Would appreciate your thoughts on this 

 

Thanks 

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.
Best answer by h_kataria

Hey ,

I checked the query is not picking up any index , though my assumption was that it should pick cq:PageLucene index which is OOTB and it has the jcrcontent/cqlasmodified property. Any pointers around that , is this index picked in your env?

 

Thanks,

 


Yes, it is being picked. Verified both in the SDK which has the same number of pages as well as on the actual cloud instance. So, there might be some issue with indexes in your project. Check if you have any custom indexes in your project which might be conflicting.

3 replies

aanchal-sikka
Community Advisor
Community Advisor
July 24, 2024

@tnik 

Have you tried exploring https://adobe-consulting-services.github.io/acs-aem-commons/features/contentsync/index.html for incremental updates?

I guess it should meet your requirements of incremental updates.

 

Few suggestions:

- Preferably create separate packages to deal with each site. 

- Always create backup of Target and then install the latest package

- Assure you have set up the package filters properly. If you need just a page, we would need to assure we do not impact the child pages.

 

 

We had create curl utility for the sync. We had hosted it as an Azure function.

Aanchal Sikka
tnikAuthor
Level 3
July 24, 2024

Hi @aanchal-sikka ,

 

I tried the ACS commons content sync in details , so the problem with this is , lets say we have a site /content/mysite/en and it has some 1000 pages and we dont know how many pages under sites have been modified in last 10 days , so if we run this utility on /content/mysite/en , it simply hangs and fails to build catalogue and could not fetch the Delta. 

that is the reason I am thinking of using above  query at the root level (/content/mysite/en), but need to optimise it so it does not fail for more than 100, 000 nodes.

 

Regarding the packages approach ,

For sync , i want to package pages or child pages which are modified in defined time period so can that be possible via Curl or should i write a custom servlet which do all the query and package processing and we call it via curl

 

Also this process has to be completely automated as we have multiple sites to handle and do not want to handled the packages manually , as that will not be scalable and maintainable.

 

Thanks,

 

arunpatidar
Community Advisor
Community Advisor
July 24, 2024

Hi @tnik 
You can use groovy to a recursive tree search to avoid node traveling issue, or you can look for Audit logs to identify page modification/creation.

https://adobe-consulting-services.github.io/acs-aem-commons/features/audit-log-search/index.html  

Arun Patidar
h_kataria
Community Advisor
Community Advisor
July 24, 2024

Have you considered using this https://adobe-consulting-services.github.io/acs-aem-commons/features/packagers/query-packager/index.html ? 
Since you already have a query, this will handle your package creation. 
And since you have already limited your query to cq:Page, I would assume you should not run into traversal exception (No harm in creating the index though) 
The only remaining thing would be on how you want to install this package which could be done in several ways depending on your DevOps setup. 

tnikAuthor
Level 3
July 24, 2024

Hi @h_kataria ,

Yes I have explored this ACS AEM commons utility , it fails when I try to run the above query on the top most language node , it works fine for the children path. 

tnikAuthor
Level 3
July 24, 2024

Yes, it is being picked. Verified both in the SDK which has the same number of pages as well as on the actual cloud instance. So, there might be some issue with indexes in your project. Check if you have any custom indexes in your project which might be conflicting.


Hi @h_kataria ,

 

Could identify and fix the index issue , now the query is fetching the required modified pages when fired from root level. Thanks. Also ACS packager utility also started working as it was dependent on the query only. it could pick cqPageLucene OOTB index Thanks!!

July 24, 2024

@tnik 

I have analyzed the solution for the problem statement, which is that we need to move content between two cloud instances automatically via packages. The requirement is to find and package the pages changed by the content author team within a specified timeframe and move them to the target instance for selective sync. Here is my proposed solution.

Proposed Solution:

  1. References Handling: It is crucial to account for any references updated, such as images or experience fragments. Use the references.json API call (the one that happens during the publish event to track these updates)

  2. Automated Content Movement: I recommend building a standard utility to automate content movement with the following steps:

    1. Author Selection: The author selects the source and destination for content movement.

    2. Package Creation: Utilize jcrPackageManager APIs to create packages.

      • Run the query (consider if indexing can help or if project-wise queries are necessary).
      • Fetch updated references for each response path.
      • Create a list of paths to be packaged.
      • Generate a package for movement (assuming there aren't thousands of page updates in a day).
    3. Servlet for Package Transfer: Pass the package to a servlet on the target instance with a technical token, and use a Sling job to process and install it.

    4. Callback Servlet: Implement a callback servlet to notify the source instance about successful installation. If there are many paths, consider the timeout possibility, as HTTP requests can have long wait times.

    5. Author Notification: The callback at the source instance should trigger a notification to inform the author about the successful content movement.

      This might not be the complete solution, or how it should be done but leaves a sketch and highlights different roles and actors that needs to be sufficed to solve it in an automated way.

      Regards,
      Divanshu