The RTE component in AEM has mainly two paste modes (wordhtml and plaintext). Plain text mode scraps all the mark-up as the mode name suggests. While “wordhtml” keeps the markups and works well for most of the tags. But when authors copy any list (ordered / unordered) from Microsoft Word document ( Desktop application ) and try to paste in RTE directly, it doesn’t paste it well. It creates individual <p> tags with dot (.) and 6 span tags instead of ul or ol as shown below.
Solution:
To resolve this, the JavaScript of the OOTB EditToolsPlugin was customized -:
Create a custom clientlibs for the RTE, applying it specifically to the RTE component.
Override the EditToolsPlugin.js to intercept the paste operation, clean MS Word formatting, and convert the list to proper <ul> or <ol> tags while preserving other styles (bold, italics, etc.).
Configure default paste mode as "wordhtml" for consistent handling of Word content.
Optional toolbar customization to enable the "paste as wordhtml" option
Hi @MukeshYadav_, appreciate you sharing this detailed solution! The way you’ve broken down handling Microsoft Word list paste issues in the AEM RTE and customizing the EditToolsPlugin to preserve formatting while converting lists is super helpful.
Curious Question: When tackling Word paste issues like this, do you usually prefer customizing the plugin client-side as shown, or do you sometimes handle it server-side by cleaning/filtering the HTML before saving? How do you decide which approach works best for a project?
Client-side processing allows the author to instantly view the exact structure of the list and make edits directly in the RTE editor pane. By pressing "Enter," the author can add or edit list items without needing to save and reopen the dialog, as would be necessary with server-side processing. Without client-side processing, if the author tries to make any edits or additions to the list, they might end up wrapped in <p> or <span> tags, which can lead to frustration or inconsistency.