Expand my Community achievements bar.

Guidelines for the Responsible Use of Generative AI in the Experience Cloud Community.
SOLVED

How to remove html tags from the content of Rich Text Editor ?

Avatar

Former Community Member

For the purpose of auto tagging, I am storing the contents of RTE in a string array and comparing it against tag namespace. When I try to store the contents of RTE in string array I am getting all the html tags \n, &nbsp, etc in the array. I am able to manually remove these tags by the following code:

textEntered = textEntered.replaceAll("\\<.*?\\>", "");
textEntered = textEntered.replace("\n", " ").replace("\r", " ");
textEntered = textEntered.replaceAll(" ", " ");

Is there any other proper or more logical way to remove all the tags from the content of RTE? Any help for more understanding would be much appreciated. Thanks in advance. 

1 Accepted Solution

Avatar

Correct answer by
Employee Advisor

You should use HTML parser libraries in Java for this use case. See an example here - http://jsoup.org/cookbook/extracting-data/attributes-text-html

View solution in original post

3 Replies

Avatar

Level 4

I think another way is to rewrite the HTML before you deliver it to the end browser.

http://jcr-nosql.com/2013/12/14/custom-rewriter-transformer-to-rewrite-any-html-output-generated-by-...

 

Thanks,

Chandra

Avatar

Correct answer by
Employee Advisor

You should use HTML parser libraries in Java for this use case. See an example here - http://jsoup.org/cookbook/extracting-data/attributes-text-html

Avatar

Level 10

Hi

Its seems every time you get data from the richtext your code will remove all the HTML tags, Its looks a simpler way to me

But still if you wish to get a better way, I would recommend you to customized the richtext according to your behaviour like it has been done here [1]

[1] http://experience-aem.blogspot.in/2014/02/aem-cq-56-extend-richtext-editor-add-new-plugin-pullquote....