Advanced workflow data enrichment | Community
Skip to main content
Darren_Bowers
Level 9
March 9, 2020
Solved

Advanced workflow data enrichment

  • March 9, 2020
  • 5 replies
  • 14881 views

Hi All - I want to do some advanced data enrichment for all targets in a workflow that ultimately requires a decent amount of complex Javascript. I have two large switch + if/then/else statements that populate three columns based on Recipient attributes (two text and one bit.ly link) that get sent on the email or SMS.

Right now, I am using a JS queryDef select to get all records then iterate over them one by one to update the corresponding fields for that target. This then updates the temporary schema in the workflow.

This is an issue for two reasons:

  1. Over 10,000 records needs to page the queryDef as the maximum individual select can only be 10k
  2. At some point over 10k records you run out of JS memory (even after increasing the JS memory)

Adobe Support says its a bad practice, but haven't suggested an alternative as this is probably Consulting

Is there a better way to do advanced enrichment during a workflow with >10k records?

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.
Best answer by Darren_Bowers

I found this which is exactly what I want to do (but with queryDef paging): https://technowide.net/2017/05/19/iterating-custom-functions-record/ 

But I can't see where the TempResponse temporary schema is created in the example. Is this created as a normal Schema beforehand? I think I am missing something here...

5 replies

Milan_Vucetic
Level 9
March 9, 2020

Hi @darren_bowers ,

 

if you already have JS and issue with JS memory/JS records limit, you can select records in batch.

Set batch to 5K or more (but lower than 10K) and you will process as many records as you want without errors.

 

var currentStartLine= 0;
var recordsToFetchPerBatch = 5000;
var loop = true;

while(loop){
var query = xtk.queryDef.create(
<queryDef schema="xtk:some_schema" operation="select" lineCount={recordsToFetchPerBatch} startLine={currentStartLine} >
<select>
<node expr="@node"/>
..
</select>
</queryDef
);
result = query.executeQuery();
if(Object.keys(result).length>0){
//do stuff
//
currentStartLine = currentStartLine + recordsToFetchPerBatch;

}else
loop = false;

}

 

Regards,

Milan

Darren_Bowers
Level 9
March 9, 2020
Thanks Milan - I was doing this previously but still found that after a few batch loads it ran out of memory. I think its because some of the objects dont release their JS memory properly. Its hard to diagnose because of the lack of tools available to profile the memory.
Darren_Bowers
Level 9
March 11, 2020

Hi @milan_vucetic - attaching some example code here as I cant do it in the comment thread. 

If you comment out the line that does the write back (datalayer.writeRecord(record);) , then the query will load all records properly and in the correct batches. If you write back to the temporary schema, then the select will miss a bunch of records.

 

/** * * Adds the bit.ly links and additional notes onto the schema records * **/ //Get the schema name (skipping namespace) var schemaName = vars.targetSchema.substring(vars.targetSchema.indexOf(":") + 1); var datalayer = { /** * Get all Movers records in targetSchema * **/ get: { records: function( recordsToFetchPerBatch,currentStartLine ) { var query = xtk.queryDef.create( <queryDef schema={vars.targetSchema} operation="select" lineCount={recordsToFetchPerBatch} startLine={currentStartLine}> <select> <node alias="switchDocumentId" expr="@switchDocumentId" /> <node alias="serviceOrderId" expr="@serviceOrderId" /> <node alias="segmentCode" expr="@segmentCode" /> <node alias="completedDate" expr="@completedDate" /> <node alias="scheduledDate" expr="@scheduledDate" /> <node alias="bitly" expr="@bitly" /> <node alias="edmLink" expr="@edmLink" /> <node alias="divisionId" expr="@divisionId" /> <node alias="customerType" expr="@customerType" /> <node alias="IrDescription" expr="@IrDescription" /> <node alias="IrMessage" expr="IrMessage" /> </select> <where> <condition expr="@switchDocumentId IS NOT NULL" bool-operator="AND" /> <condition expr="@bitly IS NULL" bool-operator="AND"/> </where> </queryDef> ).ExecuteQuery(); return query; } }, /** * Write record back to temp schema * */ writeRecord: function ( data ) { try { sqlExec("update " + vars.tableName + " set sbitly='" + data.bitly + "', sedmLink='" + data.edmLink + "', sIrDescription='" + data.IrDescription + "', " + "mIrMessage='" + data.IrMessage + "' where sswitchDocumentId = '" + data.switchDocumentId + "' AND sserviceOrderId = '" + data.serviceOrderId + "';"); } catch (e) { logError("Something bad happened in writeRecord: " + e); } } } var businessLogic = { /** * Logs all records in dataset * * */ displayAllRecords : function(records, schemaName) { logInfo("Dumping all records:"); for (var key in records[schemaName]) { if (records[schemaName].hasOwnProperty(key)) { logInfo(key + " -> " + records[schemaName][key]); } } }, /** * Finds the bit.ly link for the given record * **/ findBitlyLinks : function(records, schemaName) { logInfo("findBitlyLinks - starting..."); for (var key in records[schemaName]) { var record = records[schemaName][key]; //some business rules in here to determine unique bit.ly and edmLink for each record record.bitly = "XXXXXX"; record.edmLink = "http://www.somelink.com/blah.html"; } }, /** * Create the notes for each set of valid segment codes * **/ createIRNotes : function(records, schemaName) { for (var key in records[schemaName]) { var record = records[schemaName][key]; switch (segCode) { case 'MySegCode': record.IrDescription = 'Something'; record.IrMessage = 'Something Descriptive'; break; case 'MyOtherSegCode': record.IrDescription = 'Something Else'; record.IrMessage = 'Something Else Descriptive'; break; default: record.IrDescription = 'Missing'; record.IrMessage = 'Missing'; } //write the record back datalayer.writeRecord(record); } } } var currentStartLine = 0; var recordsToFetchPerBatch = 200; var loop = true; var records = {}; while(loop){ //get the records from the targetSchema logInfo("Getting batch from " + currentStartLine + " to " + (currentStartLine+recordsToFetchPerBatch)); records = datalayer.get.records(recordsToFetchPerBatch,currentStartLine); logInfo("Got " + Object.keys(records).length + " records"); if( Object.keys(records).length > 0 ) { //do the Bit.ly links businessLogic.findBitlyLinks(records, schemaName); //create IR Notes (this is where the write happens) businessLogic.createIRNotes(records, schemaName); //increment loop currentStartLine += recordsToFetchPerBatch; records = {}; } else { logInfo("Exiting.."); loop = false; } }

 

 

Milan_Vucetic
Level 9
March 11, 2020
Hi @DarenOrigin, I am not able to test it but from the code it seems that temporary schema changes rowIds after your update and then "confuse" the loader. Are you able create a custom table/schema and insert processed records in it? That should work for you. Regards, Milan
Darren_Bowers
Level 9
March 12, 2020

[deleted due to double post - the forum post ordering is broken]

Darren_Bowers
Darren_BowersAuthorAccepted solution
Level 9
March 12, 2020

I found this which is exactly what I want to do (but with queryDef paging): https://technowide.net/2017/05/19/iterating-custom-functions-record/ 

But I can't see where the TempResponse temporary schema is created in the example. Is this created as a normal Schema beforehand? I think I am missing something here...

Milan_Vucetic
Level 9
March 12, 2020
Hi @DarenOrigin, from this page I can see that TempResponse is custom table which means it is created as normal schema. It is not representing a temporary schema. Just create a new table and address it to save enriched records and will work for you. Creation of this table is not mentioned in the article, you have to do it on your own. Regards, Milan
February 10, 2023

It sounds like you're running into performance limitations with your current approach to data enrichment in your workflow. Here are a few alternative approaches you could consider:

  1. Batch processing: Instead of processing all records at once, you could split the records into smaller batches and process each batch individually. This would help reduce the memory footprint of your JavaScript code and also reduce the size of the queryDef select. You could use a loop to process each batch of records and update the corresponding fields in the temporary schema.
  2. Use a database: You could consider storing the data in a database, such as MySQL or PostgreSQL, and using a database query to retrieve the data and perform the enrichment. This would allow you to retrieve large amounts of data efficiently and also make it easier to maintain and update the data.
  3. Use a server-side script: Another option would be to use a server-side script, such as PHP or Python, to perform the data enrichment and then pass the results back to the workflow. This would help offload the processing from the workflow and also allow you to take advantage of the more powerful processing capabilities of the server.
  4. Consider alternative CRM data enrichment techniques: You could consider alternative data enrichment techniques, such as using lookup tables, data normalization, or data aggregation, to simplify the data enrichment process. These techniques can help reduce the complexity of your JavaScript code and also make it easier to maintain and update the data.

It's important to consider the specific requirements and constraints of your workflow when choosing the best approach for data enrichment. It's also a good idea to seek professional advice if you're unsure of the best way to proceed.