Expand my Community achievements bar.

SOLVED

Suggestions on publishing a large set of Content Fragments updated through overnight job

Avatar

Level 2

We have an overnight job that takes data from 3rd party source and updates tens of thousands of Content Fragments. 

 

Our job then runs OOTB Publish Content Tree Workflow to publish all the content fragments. 

 

These publishing workflow jobs take long time to run and many time die running for more than 24 hours.

 

The other issue is OOTB Publish Content Tree Workflow updates all the Content Fragments which are almost over 100,000. 

 

Is there any efficient way to publish the Updated Content Fragments (only update the changed Content Fragments)?

 

Thank you!

 

 

 

Topics

Topics help categorize Community content and increase your ability to discover relevant content.

1 Accepted Solution

Avatar

Correct answer by
Community Advisor

@eagleinhills : How are you updating the CFs?

Based on data from third-party, if only selected CFs need to be updated, we can trigger the "Publish Content Tree Workflow" only for content which is updated.
There are parameters available in this workflow, please see if you are already following these steps:-https://experienceleague.adobe.com/en/docs/experience-manager-cloud-service/content/operations/repli...

Parameters

  • includeChildren (boolean value, default: false). The value false means that only the path is published; true means that children are published too.

  • replicateAsParticipant (boolean value, default: false). If configured as true, the replication is using the userid of the principal which performed the participant step.

  • enableVersion (boolean value, default: false). This parameter determines if a new version is created upon replication.

  • agentId (string value, default means only agents for publish are used). It is recommended to be explicit about the agentId; for example, setting it the value: publish. Setting the agent to preview publishes to the preview service.

  • filters (string value, default means that all paths are activated). Available values are:

    • onlyActivated - only activate pages that have (already) been activated. Acts as a form of reactivation.
    • onlyModified - activate only paths which are already activated and have a modification date later than the activation date.
    • The above can be ORed with a pipe “|”. For example, onlyActivated|onlyModified.

View solution in original post

5 Replies

Avatar

Employee Advisor

Hi @eagleinhills 

 

For your replication issue, you might have to check the error logs, if your queues get blocked over a specific period of time. Based upon which we would have to figure out the way out.

 

For the Published Content Fragments, these must be updated by their replication date which is OOTB behavior. Do you have any business requirement which is being interfered with this behavior.

Avatar

Correct answer by
Community Advisor

@eagleinhills : How are you updating the CFs?

Based on data from third-party, if only selected CFs need to be updated, we can trigger the "Publish Content Tree Workflow" only for content which is updated.
There are parameters available in this workflow, please see if you are already following these steps:-https://experienceleague.adobe.com/en/docs/experience-manager-cloud-service/content/operations/repli...

Parameters

  • includeChildren (boolean value, default: false). The value false means that only the path is published; true means that children are published too.

  • replicateAsParticipant (boolean value, default: false). If configured as true, the replication is using the userid of the principal which performed the participant step.

  • enableVersion (boolean value, default: false). This parameter determines if a new version is created upon replication.

  • agentId (string value, default means only agents for publish are used). It is recommended to be explicit about the agentId; for example, setting it the value: publish. Setting the agent to preview publishes to the preview service.

  • filters (string value, default means that all paths are activated). Available values are:

    • onlyActivated - only activate pages that have (already) been activated. Acts as a form of reactivation.
    • onlyModified - activate only paths which are already activated and have a modification date later than the activation date.
    • The above can be ORed with a pipe “|”. For example, onlyActivated|onlyModified.

Avatar

Level 2

Thank you @Kamal_Kishor. I was asking this question for another team. Let me run it by them if they are following the parameters/filters properly

 

 

Avatar

Level 2

One more thing I would like to mention is the performance of Publish Content Tree in publishing 100 thousand Content Fragments if sometimes we have to update all the Content Fragments is very poor. It seems to fail even if we divide into sub categories with range from 10 thousand to 20 thousand. 

 

From smaller updates this solution will work but any suggestions on when we need to publish all of them once in a while with new data from the external source?

Avatar

Community Advisor

@eagleinhills : Updating 10-20 thousand CFs in a go seems like a lot.
If possible, please review the approach and need to update them on everyday basis. Maybe figure out another approach as it seems like we are doing a lot with updating 100 thousand CF (nodes) everyday.

I hope I have not misunderstood your use-case.
thank you.