Expand my Community achievements bar.

SOLVED

Scheduled job to import data

Avatar

Level 4

We have a requirement where we need to call external REST service, fetch data and create JCR node entries on certain time in a day.

 

1. Can someone help me to guide what is the best option do achieve that...?

We also use Jenkins in our project which probably can be used to schedule job ...?

 

2. I can write java class to write JCR data, can that java class file be called using scheduler..?

1 Accepted Solution

Avatar

Correct answer by
Community Advisor

Few questions before you implement a scheduler-

 

1. How frequently you will call rest api.

2. How frequently data could change from rest api.

 

View solution in original post

8 Replies

Avatar

Community Advisor

Hi @Mayukh007

I'd suggest you create a java scheduler and call your service from that. That way it'd be easy for you to maintain/debug the code. Now coming to the implementation:

1. You need to write a java Scheduler and specify the scheduler.expression when you want it to run(http://www.cronmaker.com/). Inside the run() method, you can call your service implementation methods

2. You need to write a Service class where you'd specify the methods(like fetchGetResponse(), writeDataToJcrNodes() etc.).

3. You need to implement all these methods in your Service Impl class.

 

Above three java classes are the minimum number of java classes you need to write in order to complete your requirement.

---------------------------

Logic to process the response and convert it to JCR nodes:

	public void testWriteToJCR(Session session) {

	 String[] pages = {
	  "page=1",
	  "page=2"
	 };

	 if (session.isLive()) {
	  try {
	   if (session.itemExists("/content/mySite/test-rest")) {
	    LOG.debug("Removing existing node: {}", "/content/mySite/test-rest");
	    session.getNode("/content/mySite/test-rest").remove();
	    session.refresh(true);
	    session.save();
	   }
	   Node jobsRootNode = JcrUtil.createPath("/content/mySite/test-rest",
	    JcrResourceConstants.NT_SLING_ORDERED_FOLDER, session);

	   for (String page: pages) {
	    HttpClient client = HttpClientBuilder.create().build();
	    String apiUrl= "https://reqres.in/api/users?" + page;
	    HttpGet get = new HttpGet(apiUrl);
	    ResponseHandler < String > responseHandler = new BasicResponseHandler();
	    String response = client.execute(get, responseHandler);
	    JSONObject jsonResponse = new JSONObject(response);
	    JSONArray jsonArray = jsonResponse.toJSONArray(jsonResponse.names()).optJSONArray(3);
	    Node pageNode = jobsRootNode.addNode(page, NodeType.NT_UNSTRUCTURED);
	    session.save();

	    for (int i = 0; i < jsonArray.length(); i++) {
	     JSONObject jobObject = jsonArray.getJSONObject(i);
	     Node jobNode = pageNode.addNode(jobObject.get("id").toString(), NodeType.NT_UNSTRUCTURED);

	     Iterator < String > keys = jobObject.keys();
	     while (keys.hasNext()) {
	      String nextKey = keys.next().toString();
	      jobNode.setProperty(nextKey, jobObject.get(nextKey).toString());
	     }
	    }
	    session.save();

	    if (session.hasPendingChanges()) {
	     session.refresh(true);
	     session.save();
	    }
	    LOG.debug("RST Response converted to JCR nodes successfully");
	   }
	  } catch (RepositoryException | IOException | JSONException e) {
	   e.printStackTrace();
	  }
	 }
	}

 

Thanks,

Bilal.

Avatar

Level 10

Hi,

On behalf of the community I thank you for providing an in-dept answer with code snippets

Personally though, I wouldn't session.save() in the try-block until the very end, in case an exception is thrown half-way leaving you with a half the nodes saved and the others not (aka: a messed-up node structure).

@Mayukh007 as @bilal_ahmad  says, you should be implementing this as an AEM scheduler, NOT something in Jenkins. If you're looking for documentation, the official Adobe tutorial on schedulers is here: https://helpx.adobe.com/experience-manager/using/aem-first-components1.html#AddJavafilestotheMavenpr...

Avatar

Level 4
Thank you so much Bilal and theop76211228 for your answers and comments. Will give it a try...

Avatar

Level 4

@bilal_ahmad 

 

HI Bilal, I am getting error when getting handle to the session to start the JCR operations after I fetch data from service.

I have a service user with CRUDE operation rights which I will use for JCR operation.

In my Scheduler class inside run method I have this code:

 

String serviceUser = CommonUtil.getProperty(context, QNA_DATALOAD_CONFIG, SERVICE_USER);

--> above code correcting returning me my service user which i configured.

 

ResourceResolver resourceResolver = new ResourceResolverUtil().getResourceResolverViaAcl(resolverFactory,serviceUser);

--> getting error in the above line

 

//getResourceResolverViaAcl method is defined in a class called ResourceResolverUtil().

public ResourceResolver getResourceResolverViaAcl(ResourceResolverFactory resolverFactory, String serviceUser) {

    try {

      Map<String, Object> param = new HashMap<String, Object>();

      param.put(ResourceResolverFactory.SUBSERVICE, serviceUser);

// error occurring in below line

      ResourceResolver resourceResolver = resolverFactory.getServiceResourceResolver(param);

      return resourceResolver;

    }

    catch (Exception e) {

      throw new IllegalStateException(e.getMessage(), e);

    }

  }

 

error.log shows below error:

26.05.2020 00:00:00.012 *ERROR* [sling-default-1-Registered Service.22296] org.apache.sling.commons.scheduler.impl.QuartzScheduler Exception during job execution of job 'ca.manulifeglobal.core.schedulers.QnAScheduledTask@1cc2f04a' with name 'Registered Service.22296' : null
java.lang.IllegalStateException: null
at ca.manulifeglobal.core.util.ResourceResolverUtil.getResourceResolverViaAcl(ResourceResolverUtil.java:36) [ca.manulife.dxp.aem-global:0.4.66.SNAPSHOT]
at ca.manulifeglobal.core.services.impl.QnASchedulerServiceImpl.callQnAMaker(QnASchedulerServiceImpl.java:79) [ca.manulife.dxp.aem-global:0.4.66.SNAPSHOT]
at ca.manulifeglobal.core.schedulers.QnAScheduledTask.run(QnAScheduledTask.java:113) [ca.manulife.dxp.aem-global:0.4.66.SNAPSHOT]
at org.apache.sling.commons.scheduler.impl.QuartzJobExecutor.execute(QuartzJobExecutor.java:347) [org.apache.sling.commons.scheduler:2.7.2]
at org.quartz.core.JobRunShell.run(JobRunShell.java:202) [org.apache.sling.commons.scheduler:2.7.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException: null
at ca.manulifeglobal.core.util.ResourceResolverUtil.getResourceResolverViaAcl(ResourceResolverUtil.java:32) [ca.manulife.dxp.aem-global:0.4.66.SNAPSHOT]
... 7 common frames omitted
26.05.2020 00:00:00.014 *INFO* [sling-default-5-com.adobe.granite.threaddump.impl.BackupCleaner] com.adobe.granite.threaddump.impl.BackupCleaner All backup(s) successfully removed.
26.05.2020 00:00:00.014 *INFO* [sling-default-4-com.day.cq.dam.similaritysearch.internal.scheduler.PeriodicAutoTaggingJob.4560] com.day.cq.dam.similaritysearch.internal.scheduler.PeriodicAutoTaggingJob Smart Tags not configured. Ignoring periodic job.

 

Any help will be greatly appreciated.

Avatar

Correct answer by
Community Advisor

Few questions before you implement a scheduler-

 

1. How frequently you will call rest api.

2. How frequently data could change from rest api.

 

Avatar

Level 4

1. It will run daily once to start with (not sure if there is manual option to trigger the job if required).

2. Data will change in restapi on daily basis. Due to the nature of the data, we will delete the existing jcr nodes (imported yesterday) and do fresh import of all data. We have endpoint to fetch all data in one call.

Avatar

Community Advisor

Yes so you can easily achieve this feature through scheduler.

You can add a cron job which will run on daily basis.

Through config you can change the expression whenever you want to run it again.

Even if you want to hit it manually then create one servlet and call your service to call rest api and do all your stuff this one is only when you want to run it manually but recommended way would be to create a scheduler.

If you need more help on scheduler let us know nor we have helpex article which you can look at.