Expand my Community achievements bar.

SOLVED

How to remove html tags from the content of Rich Text Editor ?

Avatar

Former Community Member

For the purpose of auto tagging, I am storing the contents of RTE in a string array and comparing it against tag namespace. When I try to store the contents of RTE in string array I am getting all the html tags \n, &nbsp, etc in the array. I am able to manually remove these tags by the following code:

textEntered = textEntered.replaceAll("\\<.*?\\>", "");
textEntered = textEntered.replace("\n", " ").replace("\r", " ");
textEntered = textEntered.replaceAll(" ", " ");

Is there any other proper or more logical way to remove all the tags from the content of RTE? Any help for more understanding would be much appreciated. Thanks in advance. 

1 Accepted Solution

Avatar

Correct answer by
Employee Advisor

You should use HTML parser libraries in Java for this use case. See an example here - http://jsoup.org/cookbook/extracting-data/attributes-text-html

View solution in original post

3 Replies

Avatar

Level 4

I think another way is to rewrite the HTML before you deliver it to the end browser.

http://jcr-nosql.com/2013/12/14/custom-rewriter-transformer-to-rewrite-any-html-output-generated-by-...

 

Thanks,

Chandra

Avatar

Correct answer by
Employee Advisor

You should use HTML parser libraries in Java for this use case. See an example here - http://jsoup.org/cookbook/extracting-data/attributes-text-html

Avatar

Level 10

Hi

Its seems every time you get data from the richtext your code will remove all the HTML tags, Its looks a simpler way to me

But still if you wish to get a better way, I would recommend you to customized the richtext according to your behaviour like it has been done here [1]

[1] http://experience-aem.blogspot.in/2014/02/aem-cq-56-extend-richtext-editor-add-new-plugin-pullquote....