hi ,
I am creating a migration tool in Java , to migrate content from another CMS into AEM pages. The data is in JSON format which consists of some metadata and HTML areas.
I am converting the HTML areas into text component after migration in AEM , however the data is not displaying properly on the page.
I see some weird characters after decoding the HTML in UTF-8 like �??
Is there a way to handle this in java ?
I tried using StringEscapeUtils and JSOUP without any success.
Thanks,
Sam
Views
Replies
Total Likes
By which means are you getting the data? If it is a servlet then you can set the encoding format like below:
response.setCharacterEncoding("UTF-8");
Again if you are displaying some text and see some html tags in it which should not come, then you need to add @CONTEXT='html' just after the text content.
If the above doesn't help let me know how you are getting the json from the java code.
hi Ibshikha ,
Can you provide details about adding @CONTEXT = html to the text content , is there a way of adding it programmtically in the JCR?
As I am building the pages programmatically...
Yes , I am using a servlet and I have already set the Character Encoding (UTF-8) at the response level, however this does not resolve the issue. I am using the below to fetch the JSON from API .
StringBuilder json = new StringBuilder();
url = new URL(src);
URLConnection tc = url.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()),"");
String line1 = in.readLine();
json.append(line1);
return json.toString().trim();
The above method is used to fetch the JSON into StringBuffer from API. The URL is passed to the URL
Please let me know if you have a solution to try.
Thanks,
Samiksha
Hello @samikshaa223429 ,
As per my understating, there are some special un-standard characters in your HTML areas that are non-ASCII chars. You need to remove that special character while converting from HTML area to text component text by replacing all the non-ASCII chars to empty. Something like below,
String textContent = htmlContent.replaceAll("[^\\p{ASCII}]", "");
Thanks,
Venkat.M
@samikshaa223429 : Use @CONTEXT='html' in your sightly.
Views
Like
Replies
Views
Likes
Replies