Expand my Community achievements bar.

SOLVED

Advanced workflow data enrichment

Avatar

Level 9

Hi All - I want to do some advanced data enrichment for all targets in a workflow that ultimately requires a decent amount of complex Javascript. I have two large switch + if/then/else statements that populate three columns based on Recipient attributes (two text and one bit.ly link) that get sent on the email or SMS.

Right now, I am using a JS queryDef select to get all records then iterate over them one by one to update the corresponding fields for that target. This then updates the temporary schema in the workflow.

This is an issue for two reasons:

  1. Over 10,000 records needs to page the queryDef as the maximum individual select can only be 10k
  2. At some point over 10k records you run out of JS memory (even after increasing the JS memory)

Adobe Support says its a bad practice, but haven't suggested an alternative as this is probably Consulting

Is there a better way to do advanced enrichment during a workflow with >10k records?

Topics

Topics help categorize Community content and increase your ability to discover relevant content.

1 Accepted Solution

Avatar

Correct answer by
Level 9

I found this which is exactly what I want to do (but with queryDef paging): https://technowide.net/2017/05/19/iterating-custom-functions-record/ 

But I can't see where the TempResponse temporary schema is created in the example. Is this created as a normal Schema beforehand? I think I am missing something here...

View solution in original post

15 Replies

Avatar

Community Advisor

Hi @Darren_Bowers ,

 

if you already have JS and issue with JS memory/JS records limit, you can select records in batch.

Set batch to 5K or more (but lower than 10K) and you will process as many records as you want without errors.

 

var currentStartLine= 0;
var recordsToFetchPerBatch = 5000;
var loop = true;

while(loop){
var query = xtk.queryDef.create(
<queryDef schema="xtk:some_schema" operation="select" lineCount={recordsToFetchPerBatch} startLine={currentStartLine} >
<select>
<node expr="@node"/>
..
</select>
</queryDef
);
result = query.executeQuery();
if(Object.keys(result).length>0){
//do stuff
//
currentStartLine = currentStartLine + recordsToFetchPerBatch;

}else
loop = false;

}

 

Regards,

Milan

Avatar

Level 9
Thanks Milan - I was doing this previously but still found that after a few batch loads it ran out of memory. I think its because some of the objects dont release their JS memory properly. Its hard to diagnose because of the lack of tools available to profile the memory.

Avatar

Community Advisor

Hi @DarenOrigin, assume that is the case.

 

If you are using custom buffer there is a function to release the memory.

var buffer = new MemoryBuffer();

buffer .dispose();

 

Regards,

Milan

Avatar

Level 9
Hi Milan. I found an issue with using the method you describe. Lets say I get records in batches of 100. I read 100 records, enrich the records, then write them back to the temporary schema. When I go to read the next 100 records, it contains many of the records I wrote back in the first read. So I'm not sure how you ensure that the 2nd read of 100 only contains records I haven't updated?

Avatar

Community Advisor

Hi @DarenOrigin,

 

this method works fine if you are using the incoming transition to your JS.

All data is stored in temporary table and from there JS is fetching first 100 (from rowId = 0), then another 100 (but now from rowId=101), etc...

I have not used to read data directly from the table.

 

Ensure that you have not duplicates there, in that case you can have the same record in two different batches.

 

Regards,

Milan 

Avatar

Level 9
Its even stranger than what I just described - If I have 250 total records and load 100 at a time, once I write back 100 from the first load (0-100), the queryDef will only return 50 records in the next select iteration (records 100-200). If I don't perform a write back to the temporary schema, it loads all 250 items in three batches. Any idea how to get around this?

Avatar

Community Advisor
Hi @DarenOrigin, can you share your code? That will be helpful to identify the root cause. Regards,

Avatar

Level 9

Hi @Milan_Vucetic - attaching some example code here as I cant do it in the comment thread. 

If you comment out the line that does the write back (datalayer.writeRecord(record);) , then the query will load all records properly and in the correct batches. If you write back to the temporary schema, then the select will miss a bunch of records.

 

/**
 *
 * Adds the bit.ly links and additional notes onto the schema records
 * 
 **/

//Get the schema name (skipping namespace)
var schemaName = vars.targetSchema.substring(vars.targetSchema.indexOf(":") + 1);

var datalayer = {
  /** 
   *  Get all Movers records in targetSchema
   *
   **/
  get: {
    records: function( recordsToFetchPerBatch,currentStartLine ) {
      var query = xtk.queryDef.create(  
        <queryDef schema={vars.targetSchema} operation="select" lineCount={recordsToFetchPerBatch} startLine={currentStartLine}>
          <select>
            <node alias="switchDocumentId" expr="@switchDocumentId" />
            <node alias="serviceOrderId" expr="@serviceOrderId" />
            <node alias="segmentCode" expr="@segmentCode" />
            <node alias="completedDate" expr="@completedDate" />
            <node alias="scheduledDate" expr="@scheduledDate" />
            <node alias="bitly" expr="@bitly" />
            <node alias="edmLink" expr="@edmLink" />
            <node alias="divisionId" expr="@divisionId" />
            <node alias="customerType" expr="@customerType" />
            <node alias="IrDescription" expr="@IrDescription" />
            <node alias="IrMessage" expr="IrMessage" />
          </select>
          <where>
            <condition expr="@switchDocumentId IS NOT NULL" bool-operator="AND" />
            <condition expr="@bitly IS NULL" bool-operator="AND"/>
          </where>
        </queryDef>
      ).ExecuteQuery();

      return query;
    }
  },

   /**
    * Write record back to temp schema
    *
    */
   writeRecord: function ( data ) {
      try {
        sqlExec("update " + vars.tableName + " set sbitly='" + data.bitly + "', sedmLink='" + data.edmLink + "', sIrDescription='" + data.IrDescription + "', " + "mIrMessage='" + data.IrMessage + "' where sswitchDocumentId = '" + data.switchDocumentId + "' AND sserviceOrderId = '" + data.serviceOrderId + "';");
      } catch (e) {
        logError("Something bad happened in writeRecord: " + e);
      }
   }
}

var businessLogic = {

  /**
   * Logs all records in dataset
   *
   *
   */
  displayAllRecords : function(records, schemaName) {
    logInfo("Dumping all records:");
    for (var key in records[schemaName]) {
        if (records[schemaName].hasOwnProperty(key)) {
          logInfo(key + " -> " + records[schemaName][key]);
        }
    }
  },
  /**
   * Finds the bit.ly link for the given record
   *
   **/
  findBitlyLinks : function(records, schemaName) {
    logInfo("findBitlyLinks - starting...");
    for (var key in records[schemaName]) {
  
		var record = records[schemaName][key];
		
		//some business rules in here to determine unique bit.ly and edmLink for each record
		
		record.bitly = "XXXXXX";
		record.edmLink = "http://www.somelink.com/blah.html";

     }
  },
  /**
   * Create the notes for each set of valid segment codes
   *
   **/
  createIRNotes : function(records, schemaName) {

    for (var key in records[schemaName]) {
  
      var record = records[schemaName][key];
      
      
      switch (segCode) {
      
        case 'MySegCode':
          record.IrDescription = 'Something';
          record.IrMessage = 'Something Descriptive';
          break;
          
        case 'MyOtherSegCode':
          record.IrDescription = 'Something Else';
          record.IrMessage = 'Something Else Descriptive';
          break;
          
        
        default:
          record.IrDescription = 'Missing';
          record.IrMessage = 'Missing';
        }
		
		//write the record back
        datalayer.writeRecord(record);
        
       }
  }

}

 
var currentStartLine = 0;
var recordsToFetchPerBatch = 200;
var loop = true;
var records = {};

while(loop){

  //get the records from the targetSchema
  logInfo("Getting batch from " + currentStartLine + " to " + (currentStartLine+recordsToFetchPerBatch));
  records = datalayer.get.records(recordsToFetchPerBatch,currentStartLine);
  logInfo("Got " + Object.keys(records).length + " records");
  if( Object.keys(records).length > 0 ) {
    //do the Bit.ly links
    businessLogic.findBitlyLinks(records, schemaName);
    //create IR Notes (this is where the write happens)
    businessLogic.createIRNotes(records, schemaName);
    //increment loop 
    currentStartLine += recordsToFetchPerBatch;
    records = {};
  } else {
    logInfo("Exiting..");
    loop = false;
  }
}

 

 

Avatar

Community Advisor
Hi @DarenOrigin, I am not able to test it but from the code it seems that temporary schema changes rowIds after your update and then "confuse" the loader. Are you able create a custom table/schema and insert processed records in it? That should work for you. Regards, Milan

Avatar

Level 9
Yeah I am thinking the writing back to the temporary schema changes the cursor somehow and the subsequent load skips a section. Be great if Adobe could chime in on how to avoid this. I havent created a temporary schema on the fly before nor do I know how to reference it in the code - do you have any examples?

Avatar

Level 9

[deleted due to double post - the forum post ordering is broken]

Avatar

Correct answer by
Level 9

I found this which is exactly what I want to do (but with queryDef paging): https://technowide.net/2017/05/19/iterating-custom-functions-record/ 

But I can't see where the TempResponse temporary schema is created in the example. Is this created as a normal Schema beforehand? I think I am missing something here...

Avatar

Community Advisor
Hi @DarenOrigin, from this page I can see that TempResponse is custom table which means it is created as normal schema. It is not representing a temporary schema. Just create a new table and address it to save enriched records and will work for you. Creation of this table is not mentioned in the article, you have to do it on your own. Regards, Milan

Avatar

Level 9
Thanks Milan - Looks like I will have to go the long way around on this. I was hoping not to have to create a new schema table to do this. I have a couple of workflows using the same enrichment which now means I have to create multiple permanent "temporary" tables

Avatar

Level 1

It sounds like you're running into performance limitations with your current approach to data enrichment in your workflow. Here are a few alternative approaches you could consider:

  1. Batch processing: Instead of processing all records at once, you could split the records into smaller batches and process each batch individually. This would help reduce the memory footprint of your JavaScript code and also reduce the size of the queryDef select. You could use a loop to process each batch of records and update the corresponding fields in the temporary schema.
  2. Use a database: You could consider storing the data in a database, such as MySQL or PostgreSQL, and using a database query to retrieve the data and perform the enrichment. This would allow you to retrieve large amounts of data efficiently and also make it easier to maintain and update the data.
  3. Use a server-side script: Another option would be to use a server-side script, such as PHP or Python, to perform the data enrichment and then pass the results back to the workflow. This would help offload the processing from the workflow and also allow you to take advantage of the more powerful processing capabilities of the server.
  4. Consider alternative CRM data enrichment techniques: You could consider alternative data enrichment techniques, such as using lookup tables, data normalization, or data aggregation, to simplify the data enrichment process. These techniques can help reduce the complexity of your JavaScript code and also make it easier to maintain and update the data.

It's important to consider the specific requirements and constraints of your workflow when choosing the best approach for data enrichment. It's also a good idea to seek professional advice if you're unsure of the best way to proceed.