Japanese/double byte Values becomes question mark while using JCRUtils.readFile | Community
Skip to main content
January 18, 2024
Solved

Japanese/double byte Values becomes question mark while using JCRUtils.readFile

  • January 18, 2024
  • 1 reply
  • 1623 views

Hi would like to ask how would I be able to retrieve Japanese (or other double byte characters) or symbols (like degrees) and save it to a csv as when I tried to use this one, they end up displaying as question marks.

I have here the code snippet that I'm using:

InputStream in = JcrUtils.readFile(reportNode);
InputStreamReader reader = new InputStreamReader(in);
CSVParser parser = CSVReportPrinter.parse(reader);
List<CSVRecord> records = parser.getRecords();

I also tried adding char encoding:

InputStreamReader reader = new InputStreamReader(in, "UTF-8");

 but I still got the same question marks for those characters on the csv.

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.
Best answer by LyonMartin

So, I've found out why it's not encoding properly when exporting it to csv. 
After adding UTF-8 on the OutputStreamWriter, I still encoutered the issue, but different characters are still present on the csv.
After some checking, I've found an article about opening csv using the excel
https://answers.microsoft.com/en-us/msoffice/forum/all/how-to-open-utf-8-csv-file-in-excel-without-mis/1eb15700-d235-441e-8b99-db10fafff3c2
Which in turn, partially solves the problem, though it's still a trouble if you want to open it by default.

With that, I need add UTF-8 BOM on the CSV.
https://stackoverflow.com/questions/4389005/how-to-add-a-utf-8-bom-in-java
I'm just torn if  I should add it on a BufferedWriter, but for now, this one works for me:

ByteArrayOutputStream os = new ByteArrayOutputStream();
OutputStreamWriter osw = new OutputStreamWriter(os, StandardCharsets.UTF_8);
osw.write('\ufeff');

 

1 reply

arunpatidar
Community Advisor
Community Advisor
January 18, 2024

Ensure that the encoding used for reading the input stream matches the encoding of your CSV file. Additionally, when writing the CSV file, make sure to use an appropriate encoding, such as UTF-8, to correctly store double-byte characters or symbols. If you are writing the CSV file back, you would need to use the correct encoding during the writing process as well.

Arun Patidar
LyonMartinAuthorAccepted solution
January 24, 2024

So, I've found out why it's not encoding properly when exporting it to csv. 
After adding UTF-8 on the OutputStreamWriter, I still encoutered the issue, but different characters are still present on the csv.
After some checking, I've found an article about opening csv using the excel
https://answers.microsoft.com/en-us/msoffice/forum/all/how-to-open-utf-8-csv-file-in-excel-without-mis/1eb15700-d235-441e-8b99-db10fafff3c2
Which in turn, partially solves the problem, though it's still a trouble if you want to open it by default.

With that, I need add UTF-8 BOM on the CSV.
https://stackoverflow.com/questions/4389005/how-to-add-a-utf-8-bom-in-java
I'm just torn if  I should add it on a BufferedWriter, but for now, this one works for me:

ByteArrayOutputStream os = new ByteArrayOutputStream();
OutputStreamWriter osw = new OutputStreamWriter(os, StandardCharsets.UTF_8);
osw.write('\ufeff');