Expand my Community achievements bar.

Announcement: Calling all learners and mentors! Applications are now open for the Adobe Analytics 2024 Mentorship Program! Come learn from the best to prepare for an official certification in Adobe Analytics.

SAINT classification quote escaping

Avatar

Level 2

I have a question about quoting values in SAINT files - the docs page here says that the following special characters require quoting:

 

Special characters, such as a tabs, newlines, and quotes can be embedded within a cell provided the v2.1 file format is specified and the cell is properly escaped. Special characters include:

\t     tab character
\r     form feed character
\n    newline character
"       double quote

 

Do spaces count as a special characters (is that the exhaustive list?) ?  If I have linkedin post for example, does this need to be "linkedin post" in the saint file?  If they are all text, can I quote every column?

 

Can you quote the Key column values too?

 

 

3 Replies

Avatar

Community Advisor

I would say that spaces would not be considered "special characters", spaces are shown in the example in that documentation, and still aren't listed in the list of characters to escape.

 

Basically, the files that Adobe uses are .tsv (tab separated values)

 

The reason that these four characters are listed as needing to be escaped is:

  • \t tab character - this is the specified delimiter for the file, any unescaped values in your data will cause the file to break columns unintentionally sending data that should be part of column D for example, into column E, and shifting all content from E+ into the wrong columns
  • \r form feed character / \n newline character - these are essentially "new line" characters (different systems use these characters in different combinations, hence why there are multiple specified here), which you don't want to trigger in the middle of the line, let's say you have one or both of these characters in the content of column M, this will cause the data from column N to start processing as a new line (into column A) breaking the current line from sending all data, and corrupting a new line of data
  • " double quote - while you can't see this in excel, once data is in csv or tsv formatting, string values are encapsulated in double quotes, while numeric values are not...  let's say you have the value of I'm using "quotes" for emphasis, this will come out in the feed as "I'm using "quotes" for emphasis" - to a feed, this will look like multiple values with an unrecognized value in the middle, it will see "I'm using " and " for emphasis", and it won't know how to parse this one column of data, or what to do with the random quotes in the middle.

Avatar

Level 2

Interesting, thanks Jennifer.

 

The example in the screenshot in the docs is using v.1.0, if v2.0 takes " literally, I'm not sure how v1 handled it.  

 

Is there any harm in saving all strings in the tsv quoting everything if using v.2.1?  There aren't any numeric values, but there are some blank strings.

Avatar

Community Advisor

Hi @upliftertom,

 

You shouldn't need to add quotes to the text, that's what happens when the file is saved. The rules I outlined above are mostly just general rules of thumb for tsv files... 

 

I see what you mean about it saying how v 2.0 and 2.1 handle quotes differently... 

  • v2.0 ignores quotes and assumes they are all part of the keys and values specified. For example, consider this value: “This is ““some value”””. v2.0 would interpret this literally as: “This is ““some value”””.
  • v2.1 tells classifications to assume that quotes are part of the file formatting used in Excel files. So v2.1 would format the above example to: This is “some value”.


I am not sure what this means to be honest, I would try doing some tests in a testing environment if I were you.