hit data has 'corrupt?' data in the lookup file

Question

We get a daily data feed for our clickstream data. The operating_systems.tsv file has some values that have a single row of data rather than an index alown with the operating system. Does anybody know what line 37446 is? And if we can expect this kind of data moving forward? This is breaking our ingestion scripts.

Jennifer_Dungan · Accepted Answer

That I don't know.. As far as I know, we have no control over those lookup files... they are generated by Adobe and on their logic... I know of no control that we have as system admins that allow us access to exclude bad data....

I suspect this is the result of multiple fields impacts, which causes the columns to shift strangely in the TSV file (think of this like a tab delineated list... if this file is suppose to extract column 5 and 6, but extra tabs have been added earlier, this row could really be getting the data from "2 and 3", but they look like 5 and 6 to the process). I've seen similar issues with a security tool we ran in the past.. even in my Adobe reports, content that should have been in one dimension got shifted to another...

There are two options I can see:

1. Open a ticket with client care... maybe they can explain the bad data and compensate for it? If this is an issue for you, odds are it could be an issue for others..... and it may be a simple tweak to fix the output.

2. Create a "sanitize" job to check for and remove bad lines of data like this...

I wish there was a third option that didn't involve some development work (from either Adobe or your team), and time to get it completed and tested and deployed...

Sadly, Raw Data feeds don't allow for the use of segments, where you could try to dig into the data and find a common ground way to exclude the data affected by this issue....

Jennifer_Dungan · Answer

That looks like someone may have been running a vulnerability test on your site... from a quick Google search, Oastify us coming back as a known security testing tool... while anyone can technically run such a test on any site, this was probably initiated from internal staff doing a test...  I wouldn't count that out from potentially happening again, though it will likely be infrequent (depending on which team is running the test and what their mandate is for how often it should be run)

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded