Expand my Community achievements bar.

SOLVED

clearing and refetching bulk content

Avatar

Level 4

We need to bulk publish content (10000s of pages).  They are created as a scheduled event so require almost immediate publish for all the pages in one go.  Main challenge is how to invalidate content on dispatcher (10000s of pages) and then re-fetch these pages again (on multiple dispatchers). 

 

What is the best strategy to do this ?? Is there a possibility of leveraging network storage for dispatcher cache??

 

 

Topics

Topics help categorize Community content and increase your ability to discover relevant content.

1 Accepted Solution

Avatar

Correct answer by
Employee Advisor
8 Replies

Avatar

Community Advisor

Hi @Adilmo 

You can issue an HTTP request that causes the dispatcher to delete cached files, and immediately retrieve and recache the file.

Delete and immediately re-cache files when web sites are likely to receive simultaneous client requests for the same page. Immediate recaching ensures that Dispatcher retrieves and caches the page only once, instead of once for each of the simultaneous client requests.

 

POST /dispatcher/invalidate.cache HTTP/1.1
CQ-Action: Activate
Content-Type: text/plain
CQ-Handle: /content/something/en_us/123.html
Content-Length: 12

/content/something/en_us/123.html

 

Also, you can write a flush-cache servlet which can send an invalidate request to Dispatcher and can recache the content. Please take necessary precautions while implementing the flush-cache servlet. Please see the link below for more details:

https://experienceleague.adobe.com/docs/experience-manager-dispatcher/using/configuring/page-invalid...

 

Hope this helps!

Thanks!

 

Avatar

Level 4
I Know that but my question is more on how to manage clearing and refetching of 1000s of pages on multiple dispatchers

Avatar

Correct answer by
Employee Advisor

Avatar

Employee Advisor
Did it help? Can you provide information why it did not solve your requirement?

Avatar

Level 4
I knew the refetching agent... But I also need advice sharing file system between dispatchers. Then We don't have prime each and every dispatcher.

Avatar

Employee Advisor

The above linked feature does not requires no additional activities, but that's a feature of the dispatcher. But of course the prefetching needs to happen on each publish/dispatcher instance.

 

Years back I tried to build a shared dispatcher cache using NFS. I worked for the most obvious cases, but under rare circumstances I got I/O errors on the dispatcher. I did not have the time to investigate it in details, because it should work (the dispatcher does not use unusual system calls or so). Maybe it was a problem of my setup or of one of the affected components (linuxkernel, nfsd, or the lack of the correct configuration). 

 

Things to consider when you switch to such a setup:

* Check how you want to do Blue/Green deployments on publish. In the farming approach (blue and green do not share any system) it's easier to perform a Blue/Green deployment than in the case when you just have a single cache.

* If you have any problem with that NFS share, your sites isn't available anymore.

Avatar

Community Advisor

As @Jörg_Hoh  mentioned, you can go with the re-fetching flush agent for this purpose with some modification. If number of pages are 10k then refetching al those 10k pages will flood your publisher with request from dispatcher and again will be a performance issue for you so it will be better to re-fetch only very frequent page out of those 10k like homepage and all and cache all other pages on user request. With the help of java you can use path URL to allow re-fetching.

Hope this will help.

Umesh Thakur