


hi ,
I am creating a migration tool in Java , to migrate content from another CMS into AEM pages. The data is in JSON format which consists of some metadata and HTML areas.
I am converting the HTML areas into text component after migration in AEM , however the data is not displaying properly on the page.
I see some weird characters after decoding the HTML in UTF-8 like �??
Is there a way to handle this in java ?
I tried using StringEscapeUtils and JSOUP without any success.
Thanks,
Sam
Views
Replies
Sign in to like this content
Total Likes
By which means are you getting the data? If it is a servlet then you can set the encoding format like below:
response.setCharacterEncoding("UTF-8");
Again if you are displaying some text and see some html tags in it which should not come, then you need to add @CONTEXT='html' just after the text content.
If the above doesn't help let me know how you are getting the json from the java code.
hi Ibshikha ,
Can you provide details about adding @CONTEXT = html to the text content , is there a way of adding it programmtically in the JCR?
As I am building the pages programmatically...
Yes , I am using a servlet and I have already set the Character Encoding (UTF-8) at the response level, however this does not resolve the issue. I am using the below to fetch the JSON from API .
StringBuilder json = new StringBuilder();
url = new URL(src);
URLConnection tc = url.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()),"");
String line1 = in.readLine();
json.append(line1);
return json.toString().trim();
The above method is used to fetch the JSON into StringBuffer from API. The URL is passed to the URL
Please let me know if you have a solution to try.
Thanks,
Samiksha
Hello @samikshaa223429 ,
As per my understating, there are some special un-standard characters in your HTML areas that are non-ASCII chars. You need to remove that special character while converting from HTML area to text component text by replacing all the non-ASCII chars to empty. Something like below,
String textContent = htmlContent.replaceAll("[^\\p{ASCII}]", "");
Thanks,
Venkat.M
@samikshaa223429 : Use @CONTEXT='html' in your sightly.