Expand my Community achievements bar.

SOLVED

Would clustering be an appropriate solution to this problem?

Avatar

Level 2

I'm trying to clarify the right way to design this sub-part of our CQ installation, and so am looking for feedback from you guys.  

The problem is that I have a host that will be running outside of our CQ installation, and it does some job of populating some database (outside of CQ), say, once a day.  It might take like 5-15 minutes or so, but that's not super relevant.  When this job is done, I want it to hit a URL in our CQ installation with say some POST request called something like http://host/updateJcrStats?api_key=1234832423859235 (api_key would be a simple way of preventing random web users from forcing this update).  

Since this request needs to come from the external web, it will be hitting a publish server (of which we have say 5 or 6 of them).  

I am thinking of creating a Job, like along the lines of here (http://docs.adobe.com/docs/en/cq/current/deploying/offloading/dev-offloading.html).  The job would be created in whatever publish server got hit after the load balancer, and then one specific author host would receive that Job as a JobConsumer, which should connect to that database I initially mentioned, which has just been repopulated, and then we would calculate some stats, for now would just be simple things like num_rows, but eventually could be more complex.  

In the end, I would want to programmatically be able to replicate the node which values in it which were just calculated, to the other author hosts and the publish hosts.  

So, two main issues:

1. Is putting all of the publish hosts and then one author host into a cluster, with the author being the consumer of that Job topic type, be the right way to push that Job over to the author hosts?  (Reverse replication seems to be another option, but I'd prefer to do the processing in the author to avoid stealing compute power from the hosts that are more likely to be serving web content to our actual users.)  

2. Is there an actual way to programatically or automatically replicate the data once this author host based computation is done and the JCR nodes are updated?  (The JCR nodes will only contain data that was calculated in this run, so I'm not necessarily worried about accidentally pushing say some content editor's changes, prematurely.)  

Any help with either of these would be awesome.  Just a quick pointer if I'm thinking about clustering or replication incorrectly...still fairly new to the AEM and CMS worlds here, and trying to get things right.  

Thanks

--Tom

1 Accepted Solution

Avatar

Correct answer by
Employee

In a word... no, this isn't a good use of clustering. You can't have a publish and an author in the same cluster - their content will conflict with each other.

I'm not sure why you are opposed to using reverse replication, that seems like a natural fit. The way this would work is as follows:

1. External client calls http://host/updateJcrStats?api_key=1234832423859235

2. This servlet creates some marker node (maybe with just the timestamp, it almost doesn't matter) and reverse replicates it.

3. On author, there is a workflow launcher waiting for that marker node to be updated. This fires a workflow which does whatever update you need it. At the end of the workflow, the updated data is replicated.

HTH,

Justin

View solution in original post

3 Replies

Avatar

Correct answer by
Employee

In a word... no, this isn't a good use of clustering. You can't have a publish and an author in the same cluster - their content will conflict with each other.

I'm not sure why you are opposed to using reverse replication, that seems like a natural fit. The way this would work is as follows:

1. External client calls http://host/updateJcrStats?api_key=1234832423859235

2. This servlet creates some marker node (maybe with just the timestamp, it almost doesn't matter) and reverse replicates it.

3. On author, there is a workflow launcher waiting for that marker node to be updated. This fires a workflow which does whatever update you need it. At the end of the workflow, the updated data is replicated.

HTH,

Justin

Avatar

Level 2

Thanks for your quick and super detailed response Justin.  I guess the cluster is not a valid solution if I can't put author and publish hosts in the same cluster, so I will do this with reverse replication and a workflow, and see how that goes.  

yes

--Tom

Avatar

Level 2

Thanks for your quick and super detailed response Justin.  I guess the cluster is not a valid solution if I can't put author and publish hosts in the same cluster, so I will do this with reverse replication and a workflow, and see how that goes.  

yes

--Tom