Issue while migrating HTML from JSON object into text component in AEM | Community
Skip to main content
samikshaa223429
June 24, 2021

Issue while migrating HTML from JSON object into text component in AEM

  • June 24, 2021
  • 3 replies
  • 2598 views

hi ,

 

I am creating a migration tool in Java , to migrate content from another CMS into AEM pages. The data is in JSON format which consists of some metadata and HTML areas. 

I am converting the HTML areas into text component after migration in AEM , however the data is not displaying properly on the page. 

I see some weird characters after decoding the HTML in UTF-8  like �?? 

Is there a way to handle this in java ? 

I tried using StringEscapeUtils and JSOUP without any success.

 

Thanks,

Sam

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.

3 replies

ibishika
June 24, 2021

By which means are you getting the data? If it is a servlet then you can set the encoding format like below:

response.setCharacterEncoding("UTF-8");

 

Again if you are displaying some text and see some html tags in it which should not come, then you need to add @2941342='html' just after the text content.

 

If the above doesn't help let me know how you are getting the json from the java code.

samikshaa223429
June 28, 2021

hi Ibshikha ,

 

Can you provide details about adding @2941342 = html to the text content , is there a way of adding it programmtically in the JCR?

As I am building the pages programmatically...

 

Yes , I am using a servlet and I have already set the Character Encoding (UTF-8) at the response level, however this does not resolve the issue. I am using the below to fetch the JSON from API .

StringBuilder json = new StringBuilder();
url = new URL(src);
URLConnection tc = url.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()),"");
String line1 = in.readLine();
json.append(line1);
return json.toString().trim();
The above method is used to fetch the JSON into StringBuffer from API. The URL is passed to the URL

 

Please let me know if you have a solution to try.

 

Thanks,

Samiksha

vmadala
June 25, 2021

Hello @samikshaa223429 ,

 

As per my understating, there are some special un-standard characters in your HTML areas that are non-ASCII chars. You need to remove that special character while converting from HTML area to text component text by replacing all the non-ASCII chars to empty. Something like below,

 

     String  textContent = htmlContent.replaceAll("[^\\p{ASCII}]", "");

 

Thanks,

Venkat.M

 

samikshaa223429
June 29, 2021
Thanks ,
Adobe Employee
June 25, 2021

@samikshaa223429 : Use @CONTEXT='html'  in your sightly.

samikshaa223429
June 25, 2021
hi Bimmi, Currently I am storing it in a AEM Core Text component , so don't have any sightly code for rendering the HTML.