Expand my Community achievements bar.

SOLVED

hit data has 'corrupt?' data in the lookup file

Avatar

Level 4

We get a daily data feed for our clickstream data.  The operating_systems.tsv file has some values that have a single row of data rather than an index alown with the operating system.  Does anybody know what line 37446 is?  And if we can expect this kind of data moving forward?  This is breaking our ingestion scripts.

 

valerie_anders_0-1680537645003.png

1 Accepted Solution

Avatar

Correct answer by
Community Advisor and Adobe Champion

That I don't know.. As far as I know, we have no control over those lookup files... they are generated by Adobe and on their logic... I know of no control that we have as system admins that allow us access to exclude bad data....

 

I suspect this is the result of multiple fields impacts, which causes the columns to shift strangely in the TSV file (think of this like a tab delineated list... if this file is suppose to extract column 5 and 6, but extra tabs have been added earlier, this row could really be getting the data from "2 and 3", but they look like 5 and 6 to the process). I've seen similar issues with a security tool we ran in the past.. even in my Adobe reports, content that should have been in one dimension got shifted to another... 

 

There are two options I can see:

 

1. Open a ticket with client care... maybe they can explain the bad data and compensate for it? If this is an issue for you, odds are it could be an issue for others..... and it may be a simple tweak to fix the output.

 

2. Create a "sanitize" job to check for and remove bad lines of data like this...

 

I wish there was a third option that didn't involve some development work (from either Adobe or your team), and time to get it completed and tested and deployed... 

 

 

Sadly, Raw Data feeds don't allow for the use of segments, where you could try to dig into the data and find a common ground way to exclude the data affected by this issue.... 

View solution in original post

5 Replies

Avatar

Community Advisor and Adobe Champion

That looks like someone may have been running a vulnerability test on your site... from a quick Google search, Oastify us coming back as a known security testing tool... while anyone can technically run such a test on any site, this was probably initiated from internal staff doing a test...  I wouldn't count that out from potentially happening again, though it will likely be infrequent (depending on which team is running the test and what their mandate is for how often it should be run)

Avatar

Level 2

Our IT department says they are unable to load the file since they expect that file to always have an index then a tab followed by the operating system.  Why would this data, even if it is odd or unexpected be added to a lookup file in this manner?  Is there a way to remove that from the lookup file?  Or that the data be something like...

index_number  BCC:<whatever>

 

Or even...

index_number  unrecognized

 

Or

index_number Other

Avatar

Correct answer by
Community Advisor and Adobe Champion

That I don't know.. As far as I know, we have no control over those lookup files... they are generated by Adobe and on their logic... I know of no control that we have as system admins that allow us access to exclude bad data....

 

I suspect this is the result of multiple fields impacts, which causes the columns to shift strangely in the TSV file (think of this like a tab delineated list... if this file is suppose to extract column 5 and 6, but extra tabs have been added earlier, this row could really be getting the data from "2 and 3", but they look like 5 and 6 to the process). I've seen similar issues with a security tool we ran in the past.. even in my Adobe reports, content that should have been in one dimension got shifted to another... 

 

There are two options I can see:

 

1. Open a ticket with client care... maybe they can explain the bad data and compensate for it? If this is an issue for you, odds are it could be an issue for others..... and it may be a simple tweak to fix the output.

 

2. Create a "sanitize" job to check for and remove bad lines of data like this...

 

I wish there was a third option that didn't involve some development work (from either Adobe or your team), and time to get it completed and tested and deployed... 

 

 

Sadly, Raw Data feeds don't allow for the use of segments, where you could try to dig into the data and find a common ground way to exclude the data affected by this issue.... 

Avatar

Level 4

Thank you.  I opened a ticket with client care.  Our IT department wants to hear back from Adobe before making changes to our ingestion scripts.

Avatar

Community Advisor and Adobe Champion

Good luck, and yes, I would likely do the same.. wait for Adobe to respond before changing your ingestion process.

 

For all we know, this type of data was always stripped out, and a recent change affected the logic in a way that allowed it to sneak through...