Expand my Community achievements bar.

Join us for the next Community Q&A Coffee Break on Tuesday April 23, 2024 with Eric Matisoff, Principal Evangelist, Analytics & Data Science, who will join us to discuss all the big news and announcements from Summit 2024!
SOLVED

Classification Rule Limits

Avatar

Level 2

Hello everyone !

I have implemented new bucketed dimension using Traffic Variable Classification.

In most cases it works perfect, but some values get into 'Unspecified' in new classified dimension, while I expect them to get in the another value. It isn't a problem with the regular expressions because the same value like '1' can be in the '1 - 10' item and in the Unspecified.

Maybe there are any limits on the count of values in the prop, I have near 8-10 millions Occurances per day.
Please write if you have any ideas on the reasons of that.

1 Accepted Solution

Avatar

Correct answer by
Employee Advisor

@AMinakov Look back window would be one of the reasons, for this. Below are some points you would need to keep in mind

  • Sub-classifications are not supported with Classification Rule Builder (CRB).
  • Our current classification system can only export up to 10 million rows at a time.
  • When CRB requests an export, it pulls both classified AND unclassified values, with unclassified values coming through at the end of the export. This means that, over time, you could fill up 10 million classified values - without ever getting to the unclassified values.
  • Because the architecture is set up in a way that CRB could be pulling from “n” number of servers, this can lead to inconsistencies as to which servers get picked up and in what order. For that reason, it is very difficult to get to unclassified values.

This is the workaround for those who have more than 10 million classified values for a dimension: You will need to export unclassified values via FTP, in 10-million batches, and manually classify them.

View solution in original post

6 Replies

Avatar

Community Advisor

First question how are you classifying things.

 

Using classification rule builder or Uploading files to be classified?

Avatar

Level 2

Pablo, thank you for your attention !
I used Classification Rule Builder.

Avatar

Community Advisor

I am not aware of any limitation by sheer numbers processed.

 

I suspect the sheer volume you encounter is just having instances/variations that though they seem to should fall into one classification are somehow being processed as unspecified.

 

Could there be some logic that when so many are sent they are getting corrupted so that some have extra characters in them?

 

Tough to understand without examples and without seeing classification logic. I do know I have had to re adjust my classification regex from time to time as some new variations can be acting as you describe. 

Avatar

Community Advisor

Do the "Unspecified" ones appear when looking at the current day's data only, or with old data too? If old data, how far back do "Unspecified" appear?

Avatar

Level 2

The current data too. I thought about lookback window too, but the problem most likely caused by the big data, because with the time % of Unspecified decrease.

Avatar

Correct answer by
Employee Advisor

@AMinakov Look back window would be one of the reasons, for this. Below are some points you would need to keep in mind

  • Sub-classifications are not supported with Classification Rule Builder (CRB).
  • Our current classification system can only export up to 10 million rows at a time.
  • When CRB requests an export, it pulls both classified AND unclassified values, with unclassified values coming through at the end of the export. This means that, over time, you could fill up 10 million classified values - without ever getting to the unclassified values.
  • Because the architecture is set up in a way that CRB could be pulling from “n” number of servers, this can lead to inconsistencies as to which servers get picked up and in what order. For that reason, it is very difficult to get to unclassified values.

This is the workaround for those who have more than 10 million classified values for a dimension: You will need to export unclassified values via FTP, in 10-million batches, and manually classify them.