Expand my Community achievements bar.

Join us for the next Community Q&A Coffee Break on Tuesday April 23, 2024 with Eric Matisoff, Principal Evangelist, Analytics & Data Science, who will join us to discuss all the big news and announcements from Summit 2024!

Header Row on Data Feed

Avatar

Level 1

11/28/11

Hello,

 

    I am currently developing a data feed parser and along the way have noticed a few problems. The first problem is that the feeds do not come equipped with row one showing the name of the column. Instead, there is documentation that lists each of the column names, and the advice given to me by Omniture staff was to use this document in order to find out what the column names are. The problem with the approach is that my parser must take it on blind faith that the columns specified in its settings will indeed match up to the columns present in the data feed.

However, upon the occurrence of an encoding error, an odd character interpreted as a comma, tab, or double-quote would create a negative or positive offset. Under such conditions, the parser would see the offset as an additional column, and one for which is does not have settings. Conversely, if there was a header row, then I could possibly trap the error by checking the number of columns present on a row against the number specified in the header. I am unable to do this currently because not every cell of row one contains data. Therefore, in order for the parser to know the true number of columns present in the file, it must consume the entire file to determine the maximum number of columns present on each row. As it so happens, that is too intensive an operation to accomplish on such large files.

 

    I think the option of including a header row would be a great feature to add, but in addition, the other two problems with the file concern both how the feed is established and how it updates. At MNGI, we have a large number of report suites, and more are created every week. Given enough time and even a slight lack of diligence, eventually, someone will forget to send the request to Omniture to set up a data feed for a new report suite. The scenario would lead to the presence of holes in the data, unless it was backfilled. Second, it would be nice if we also had the option of allowing the file to update automatically with any additional columns should Adobe eventually add them.

 

Therefore, in conclusion, it would great to see three new features:

  1. Option for a header row to specify column names (enables better validation)
  2. Option for the automatic setup of a new data feed whenever a report suite is created (reduces need for maintenance)
  3. Option to use all current columns regardless of individual report suite settings, lack of data, or time when the data feed was set up. (Reduces need for maintenance, and triggers a new work-flow to update the settings for the parser)

Thanks,

Josh Dannemann

jdannemann@medianewsgroup.com

1 Comment

Avatar

Level 1

12/30/11

Just want to mention... it seems the data feed uses JavaScript string literals for tabs, backslashes, and newlines. This is odd for a TSV given that illegal characters are typically encapsulated in quotes, but not an insurmountable problem. Earlier, I posted a comment noting an error in the data feed, but it turned out not to be an error... just a different format for handling the illegal characters.