Advanced workflow data enrichment

Darren_Bowers

MVP

08-03-2020

Hi All - I want to do some advanced data enrichment for all targets in a workflow that ultimately requires a decent amount of complex Javascript. I have two large switch + if/then/else statements that populate three columns based on Recipient attributes (two text and one bit.ly link) that get sent on the email or SMS.

Right now, I am using a JS queryDef select to get all records then iterate over them one by one to update the corresponding fields for that target. This then updates the temporary schema in the workflow.

This is an issue for two reasons:

  1. Over 10,000 records needs to page the queryDef as the maximum individual select can only be 10k
  2. At some point over 10k records you run out of JS memory (even after increasing the JS memory)

Adobe Support says its a bad practice, but haven't suggested an alternative as this is probably Consulting

Is there a better way to do advanced enrichment during a workflow with >10k records?

enrichment javascript workflow

Accepted Solutions (1)

Accepted Solutions (1)

Darren_Bowers

MVP

11-03-2020

I found this which is exactly what I want to do (but with queryDef paging): https://technowide.net/2017/05/19/iterating-custom-functions-record/ 

But I can't see where the TempResponse temporary schema is created in the example. Is this created as a normal Schema beforehand? I think I am missing something here...

Answers (3)

Answers (3)

Milan_Vucetic

MVP

09-03-2020

Hi @Darren_Bowers ,

 

if you already have JS and issue with JS memory/JS records limit, you can select records in batch.

Set batch to 5K or more (but lower than 10K) and you will process as many records as you want without errors.

 

var currentStartLine= 0;
var recordsToFetchPerBatch = 5000;
var loop = true;

while(loop){
var query = xtk.queryDef.create(
<queryDef schema="xtk:some_schema" operation="select" lineCount={recordsToFetchPerBatch} startLine={currentStartLine} >
<select>
<node expr="@node"/>
..
</select>
</queryDef
);
result = query.executeQuery();
if(Object.keys(result).length>0){
//do stuff
//
currentStartLine = currentStartLine + recordsToFetchPerBatch;

}else
loop = false;

}

 

Regards,

Milan

Thanks Milan - I was doing this previously but still found that after a few batch loads it ran out of memory. I think its because some of the objects dont release their JS memory properly. Its hard to diagnose because of the lack of tools available to profile the memory.

Hi @DarenOrigin, assume that is the case.

 

If you are using custom buffer there is a function to release the memory.

var buffer = new MemoryBuffer();

buffer .dispose();

 

Regards,

Milan

Hi Milan. I found an issue with using the method you describe. Lets say I get records in batches of 100. I read 100 records, enrich the records, then write them back to the temporary schema. When I go to read the next 100 records, it contains many of the records I wrote back in the first read. So I'm not sure how you ensure that the 2nd read of 100 only contains records I haven't updated?

Hi @DarenOrigin,

 

this method works fine if you are using the incoming transition to your JS.

All data is stored in temporary table and from there JS is fetching first 100 (from rowId = 0), then another 100 (but now from rowId=101), etc...

I have not used to read data directly from the table.

 

Ensure that you have not duplicates there, in that case you can have the same record in two different batches.

 

Regards,

Milan 

Its even stranger than what I just described - If I have 250 total records and load 100 at a time, once I write back 100 from the first load (0-100), the queryDef will only return 50 records in the next select iteration (records 100-200). If I don't perform a write back to the temporary schema, it loads all 250 items in three batches. Any idea how to get around this?
Hi @DarenOrigin, can you share your code? That will be helpful to identify the root cause. Regards,

Darren_Bowers

MVP

11-03-2020

Hi @Milan_Vucetic - attaching some example code here as I cant do it in the comment thread. 

If you comment out the line that does the write back (datalayer.writeRecord(record);) , then the query will load all records properly and in the correct batches. If you write back to the temporary schema, then the select will miss a bunch of records.

 

/**
 *
 * Adds the bit.ly links and additional notes onto the schema records
 * 
 **/

//Get the schema name (skipping namespace)
var schemaName = vars.targetSchema.substring(vars.targetSchema.indexOf(":") + 1);

var datalayer = {
  /** 
   *  Get all Movers records in targetSchema
   *
   **/
  get: {
    records: function( recordsToFetchPerBatch,currentStartLine ) {
      var query = xtk.queryDef.create(  
        <queryDef schema={vars.targetSchema} operation="select" lineCount={recordsToFetchPerBatch} startLine={currentStartLine}>
          <select>
            <node alias="switchDocumentId" expr="@switchDocumentId" />
            <node alias="serviceOrderId" expr="@serviceOrderId" />
            <node alias="segmentCode" expr="@segmentCode" />
            <node alias="completedDate" expr="@completedDate" />
            <node alias="scheduledDate" expr="@scheduledDate" />
            <node alias="bitly" expr="@bitly" />
            <node alias="edmLink" expr="@edmLink" />
            <node alias="divisionId" expr="@divisionId" />
            <node alias="customerType" expr="@customerType" />
            <node alias="IrDescription" expr="@IrDescription" />
            <node alias="IrMessage" expr="IrMessage" />
          </select>
          <where>
            <condition expr="@switchDocumentId IS NOT NULL" bool-operator="AND" />
            <condition expr="@bitly IS NULL" bool-operator="AND"/>
          </where>
        </queryDef>
      ).ExecuteQuery();

      return query;
    }
  },

   /**
    * Write record back to temp schema
    *
    */
   writeRecord: function ( data ) {
      try {
        sqlExec("update " + vars.tableName + " set sbitly='" + data.bitly + "', sedmLink='" + data.edmLink + "', sIrDescription='" + data.IrDescription + "', " + "mIrMessage='" + data.IrMessage + "' where sswitchDocumentId = '" + data.switchDocumentId + "' AND sserviceOrderId = '" + data.serviceOrderId + "';");
      } catch (e) {
        logError("Something bad happened in writeRecord: " + e);
      }
   }
}

var businessLogic = {

  /**
   * Logs all records in dataset
   *
   *
   */
  displayAllRecords : function(records, schemaName) {
    logInfo("Dumping all records:");
    for (var key in records[schemaName]) {
        if (records[schemaName].hasOwnProperty(key)) {
          logInfo(key + " -> " + records[schemaName][key]);
        }
    }
  },
  /**
   * Finds the bit.ly link for the given record
   *
   **/
  findBitlyLinks : function(records, schemaName) {
    logInfo("findBitlyLinks - starting...");
    for (var key in records[schemaName]) {
  
		var record = records[schemaName][key];
		
		//some business rules in here to determine unique bit.ly and edmLink for each record
		
		record.bitly = "XXXXXX";
		record.edmLink = "http://www.somelink.com/blah.html";

     }
  },
  /**
   * Create the notes for each set of valid segment codes
   *
   **/
  createIRNotes : function(records, schemaName) {

    for (var key in records[schemaName]) {
  
      var record = records[schemaName][key];
      
      
      switch (segCode) {
      
        case 'MySegCode':
          record.IrDescription = 'Something';
          record.IrMessage = 'Something Descriptive';
          break;
          
        case 'MyOtherSegCode':
          record.IrDescription = 'Something Else';
          record.IrMessage = 'Something Else Descriptive';
          break;
          
        
        default:
          record.IrDescription = 'Missing';
          record.IrMessage = 'Missing';
        }
		
		//write the record back
        datalayer.writeRecord(record);
        
       }
  }

}

 
var currentStartLine = 0;
var recordsToFetchPerBatch = 200;
var loop = true;
var records = {};

while(loop){

  //get the records from the targetSchema
  logInfo("Getting batch from " + currentStartLine + " to " + (currentStartLine+recordsToFetchPerBatch));
  records = datalayer.get.records(recordsToFetchPerBatch,currentStartLine);
  logInfo("Got " + Object.keys(records).length + " records");
  if( Object.keys(records).length > 0 ) {
    //do the Bit.ly links
    businessLogic.findBitlyLinks(records, schemaName);
    //create IR Notes (this is where the write happens)
    businessLogic.createIRNotes(records, schemaName);
    //increment loop 
    currentStartLine += recordsToFetchPerBatch;
    records = {};
  } else {
    logInfo("Exiting..");
    loop = false;
  }
}