Adobe Analytics

valerie_anders · 4/3/23

We get a daily data feed for our clickstream data. The operating_systems.tsv file has some values that have a single row of data rather than an index alown with the operating system. Does anybody know what line 37446 is? And if we can expect this kind of data moving forward? This is breaking our ingestion scripts.

Jennifer_Dungan · 4/5/23

That I don't know.. As far as I know, we have no control over those lookup files... they are generated by Adobe and on their logic... I know of no control that we have as system admins that allow us access to exclude bad data....

I suspect this is the result of multiple fields impacts, which causes the columns to shift strangely in the TSV file (think of this like a tab delineated list... if this file is suppose to extract column 5 and 6, but extra tabs have been added earlier, this row could really be getting the data from "2 and 3", but they look like 5 and 6 to the process). I've seen similar issues with a security tool we ran in the past.. even in my Adobe reports, content that should have been in one dimension got shifted to another...

There are two options I can see:

1. Open a ticket with client care... maybe they can explain the bad data and compensate for it? If this is an issue for you, odds are it could be an issue for others..... and it may be a simple tweak to fix the output.

2. Create a "sanitize" job to check for and remove bad lines of data like this...

I wish there was a third option that didn't involve some development work (from either Adobe or your team), and time to get it completed and tested and deployed...

Sadly, Raw Data feeds don't allow for the use of segments, where you could try to dig into the data and find a common ground way to exclude the data affected by this issue....

View solution in original post

Jennifer_Dungan · 4/3/23

That looks like someone may have been running a vulnerability test on your site... from a quick Google search, Oastify us coming back as a known security testing tool... while anyone can technically run such a test on any site, this was probably initiated from internal staff doing a test... I wouldn't count that out from potentially happening again, though it will likely be infrequent (depending on which team is running the test and what their mandate is for how often it should be run)

valeriea6918303 · 4/5/23

Our IT department says they are unable to load the file since they expect that file to always have an index then a tab followed by the operating system. Why would this data, even if it is odd or unexpected be added to a lookup file in this manner? Is there a way to remove that from the lookup file? Or that the data be something like...

index_number BCC:<whatever>

Or even...

index_number unrecognized

Or

index_number Other

Jennifer_Dungan · 4/5/23

That I don't know.. As far as I know, we have no control over those lookup files... they are generated by Adobe and on their logic... I know of no control that we have as system admins that allow us access to exclude bad data....

I suspect this is the result of multiple fields impacts, which causes the columns to shift strangely in the TSV file (think of this like a tab delineated list... if this file is suppose to extract column 5 and 6, but extra tabs have been added earlier, this row could really be getting the data from "2 and 3", but they look like 5 and 6 to the process). I've seen similar issues with a security tool we ran in the past.. even in my Adobe reports, content that should have been in one dimension got shifted to another...

There are two options I can see:

1. Open a ticket with client care... maybe they can explain the bad data and compensate for it? If this is an issue for you, odds are it could be an issue for others..... and it may be a simple tweak to fix the output.

2. Create a "sanitize" job to check for and remove bad lines of data like this...

I wish there was a third option that didn't involve some development work (from either Adobe or your team), and time to get it completed and tested and deployed...

Sadly, Raw Data feeds don't allow for the use of segments, where you could try to dig into the data and find a common ground way to exclude the data affected by this issue....