Hi Team,
I have a Data Feeds Job which is moving hourly to SFTP server & from there to a Data Lake.
We are seeing the data land as a string but in unicode format like below, is there any config on your end that would make this a regular string?
"\u0006\u001b58\u001d�H�4;5;�x{HJ�0aQ�O-\u0002�n��\u001e'=s��\u0016\u0014ղ*7ӥV\u001bJ�\u0019,¯h?
If not, Is there any alternative we can use to make data in regular string.
Thanks!
Solved! Go to Solution.
Views
Replies
Total Likes
We too are using hourly data feeds, and pulling that data into our Data Lake... like you our files are .tsv and zipped... no one in our Data Engineering team has has any issues like what you are describing above.
Which is why I am wondering if you might be using a character encoding that is not based in an alphabet which is easily encoded with UTF-8, such as a Cyrillic language like Russian or Ukrainian, or a glyph-based language like Mandarin, Cantonese, Japanese, etc...
Adobe data, aside from some of the standard date based data should all be string values (all props and dimensions are text strings).... so unless there is an encoding issue for use with another language, it seems more likely that there is something happening in your ETL process...
Without seeing it, however, it's very hard to diagnose... perhaps Customer Care would be better, as they have more information about your data and you can share details there that you can't otherwise share on a public forum.
You said you see this as text in the landing area of your data landing zone? There really isn't any formatting options on the Data Feeds themselves that I can see... I think you may have to look at the ETL process in your own system to try and make sure that data is properly encoded and translated.
Are you dealing with a Cyrillic or glyph-based language? Do you know what the code above should equate to?
Also, is the sample something you are seeing on every row, or just a few random rows that could be some potentially content that was generated by someone "hacking" or trying to run a security scan of your site (periodically I will get what looks like SQL ingestion code from our security software checking for vulnerabilities).
Views
Replies
Total Likes
We are collecting real time data & moving hourly data to data lake using Data Feeds. The files which are in ".tsv" format are archived to ".zip" format before moving. When our system is trying to read the files, they are getting string in the encoded format. Is there a way we can move the data in regular string from Data Feeds? Or do we have any alternate for this?
Views
Replies
Total Likes
We too are using hourly data feeds, and pulling that data into our Data Lake... like you our files are .tsv and zipped... no one in our Data Engineering team has has any issues like what you are describing above.
Which is why I am wondering if you might be using a character encoding that is not based in an alphabet which is easily encoded with UTF-8, such as a Cyrillic language like Russian or Ukrainian, or a glyph-based language like Mandarin, Cantonese, Japanese, etc...
Adobe data, aside from some of the standard date based data should all be string values (all props and dimensions are text strings).... so unless there is an encoding issue for use with another language, it seems more likely that there is something happening in your ETL process...
Without seeing it, however, it's very hard to diagnose... perhaps Customer Care would be better, as they have more information about your data and you can share details there that you can't otherwise share on a public forum.
Views
Likes
Replies
Views
Likes
Replies
Views
Likes
Replies