Adobe Experience Manager Sites & More

Report · 10/15/15

For the purpose of auto tagging, I am storing the contents of RTE in a string array and comparing it against tag namespace. When I try to store the contents of RTE in string array I am getting all the html tags \n, &nbsp, etc in the array. I am able to manually remove these tags by the following code:

textEntered = textEntered.replaceAll("\\<.*?\\>", "");
textEntered = textEntered.replace("\n", " ").replace("\r", " ");
textEntered = textEntered.replaceAll(" ", " ");

Is there any other proper or more logical way to remove all the tags from the content of RTE? Any help for more understanding would be much appreciated. Thanks in advance.

Kunal_Gaba_ · 10/15/15

You should use HTML parser libraries in Java for this use case. See an example here - http://jsoup.org/cookbook/extracting-data/attributes-text-html

View solution in original post

Chandra_gupta · 10/15/15

I think another way is to rewrite the HTML before you deliver it to the end browser.

http://jcr-nosql.com/2013/12/14/custom-rewriter-transformer-to-rewrite-any-html-output-generated-by-...

Thanks,

Chandra

Kunal_Gaba_ · 10/15/15

You should use HTML parser libraries in Java for this use case. See an example here - http://jsoup.org/cookbook/extracting-data/attributes-text-html

edubey · 10/15/15

Hi

Its seems every time you get data from the richtext your code will remove all the HTML tags, Its looks a simpler way to me

But still if you wish to get a better way, I would recommend you to customized the richtext according to your behaviour like it has been done here [1]

[1] http://experience-aem.blogspot.in/2014/02/aem-cq-56-extend-richtext-editor-add-new-plugin-pullquote....