Expand my Community achievements bar.

Guidelines for the Responsible Use of Generative AI in the Experience Cloud Community.
SOLVED

Japanese/double byte Values becomes question mark while using JCRUtils.readFile

Avatar

Level 4

Hi would like to ask how would I be able to retrieve Japanese (or other double byte characters) or symbols (like degrees) and save it to a csv as when I tried to use this one, they end up displaying as question marks.

I have here the code snippet that I'm using:

InputStream in = JcrUtils.readFile(reportNode);
InputStreamReader reader = new InputStreamReader(in);
CSVParser parser = CSVReportPrinter.parse(reader);
List<CSVRecord> records = parser.getRecords();

I also tried adding char encoding:

InputStreamReader reader = new InputStreamReader(in, "UTF-8");

 but I still got the same question marks for those characters on the csv.

1 Accepted Solution

Avatar

Correct answer by
Level 4

So, I've found out why it's not encoding properly when exporting it to csv. 
After adding UTF-8 on the OutputStreamWriter, I still encoutered the issue, but different characters are still present on the csv.
After some checking, I've found an article about opening csv using the excel
https://answers.microsoft.com/en-us/msoffice/forum/all/how-to-open-utf-8-csv-file-in-excel-without-m...
Which in turn, partially solves the problem, though it's still a trouble if you want to open it by default.

With that, I need add UTF-8 BOM on the CSV.
https://stackoverflow.com/questions/4389005/how-to-add-a-utf-8-bom-in-java
I'm just torn if  I should add it on a BufferedWriter, but for now, this one works for me:

ByteArrayOutputStream os = new ByteArrayOutputStream();
OutputStreamWriter osw = new OutputStreamWriter(os, StandardCharsets.UTF_8);
osw.write('\ufeff');

 

View solution in original post

2 Replies

Avatar

Community Advisor

Ensure that the encoding used for reading the input stream matches the encoding of your CSV file. Additionally, when writing the CSV file, make sure to use an appropriate encoding, such as UTF-8, to correctly store double-byte characters or symbols. If you are writing the CSV file back, you would need to use the correct encoding during the writing process as well.



Arun Patidar

Avatar

Correct answer by
Level 4

So, I've found out why it's not encoding properly when exporting it to csv. 
After adding UTF-8 on the OutputStreamWriter, I still encoutered the issue, but different characters are still present on the csv.
After some checking, I've found an article about opening csv using the excel
https://answers.microsoft.com/en-us/msoffice/forum/all/how-to-open-utf-8-csv-file-in-excel-without-m...
Which in turn, partially solves the problem, though it's still a trouble if you want to open it by default.

With that, I need add UTF-8 BOM on the CSV.
https://stackoverflow.com/questions/4389005/how-to-add-a-utf-8-bom-in-java
I'm just torn if  I should add it on a BufferedWriter, but for now, this one works for me:

ByteArrayOutputStream os = new ByteArrayOutputStream();
OutputStreamWriter osw = new OutputStreamWriter(os, StandardCharsets.UTF_8);
osw.write('\ufeff');