Expand my Community achievements bar.

SOLVED

Is there documentation about importing MS Word file into Adobe Guides?

Avatar

Level 3

I am working on an Adobe Guides set-up project and I have not been able to find any documentation about how to import an MS Word document and have it be ingested and converted to a working DITA file with Adobe Guides. Is there any documentation posted about this? It is touted as a feature of the application, but there does not appear to be any information about it, nor does it "just work" with the platform.

Thank you for any insight! 

1 Accepted Solution

Avatar

Correct answer by
Employee

AEM Guides allows you to migrate your existing Word documents (.docx) into DITA topic type docu- ments. You need to specify the input and output folder locations along with other parameters and the document gets converted into DITA document. Depending on the content, you could have a .dita file and a .ditamap file.

 

To be able to convert a Word document successfully, your document should be well structured. For example, your document should have a Title, followed by Heading 1, Heading 2, and so on. Each of the headings should have some content in it. If your document is not well structured, the process might not work as expected.

 

By default, AEM Guides uses the Word-to-DITA (Word2DITA) transformation framework. This transfor- mation depends on the style-to-tag mapping configuration file.

To be able to use the Word2DITA trans- formation successfully, you must consider the following guidelines for preparing your Word document for conversion:

 

Spoiler
NOTE: If you make any changes in the default style-to-tag mapping configuration file, then you must update and use the guidelines confirming to your updated style mapping.

 

  • Ensure that your document starts with a Title; this Title is mapped to the DITA map title. Also, the Title must be followed by some regular content.
  • After the Title, there should be Heading 1, Heading 2, and so on. Each Heading must have some content in it. The Headings are converted into new Concept type topics. The hierarchy of the gener- ated topics is as per the Heading levels in the document, for example, Heading 1 will precede Heading 2, and Heading 2 will precede Heading 3 content.
  • The document must have at least one Heading type content.
  • Ensure that you do not have any grouped images. In case you have grouped images in your docu- ment, ungroup all such images.
  • Remove all headers and footers.
  • Inline styles such as bold, italics, and underline are converted into <b>, <i>, and <u> elements.
  • All ordered and unordered lists are converted into <ol> and <ul> elements. This also applies to nested lists, lists within tables, notes, or footnotes.
  • All hyperlinks are converted into <xref>.
  • The filename of the converted files is based on the heading text followed by a file number. The file number is a sequential number based on the position of the heading text in the document. For example, if a heading text is “Sample Heading” and it is 10th heading in the document, then the resultant filename for this topic will be similar to Sample_Heading_10.dita.

Perform the following steps to convert your existing Word documents into DITA topic type document:

  1. Log into AEM and open the CRXDE Lite mode.
  2. Navigate to the default configuration file available at the following location: /libs/fmdita/config/w2d_io.xml
  3. Create an overlay node of the config folder within the apps node.
  4. Navigate to the configuration file available in the apps node: /apps/fmdita/config/w2d_io.xml
    1.   The w2d_io.xml file contains the following configurable parameters:
    2. –	In the inputDir element, specify the location of the input folder wherein your source Word documents are available. For example, if your Word documents are stored in a folder named wordtodita in projects folder, then specify the location as:
      /content/dam/projects/wordtodita/
      –	In theoutputDir element, specify the location of the output folder or keep the default output location to save the converted DITA document. If the specified output folder does not exist on DAM, then the conversion workflow creates the output folder.
      –	For the createRev element, specify whether a new version of the converted DITA topic is to be created (true) or not (false).
      –	In the s2tMap element, specify the location of the map file that contains mappings for Word document styles to DITA elements. The default mapping is stored in the file located at:
      /libs/fmdita/word2dita/word-builtin-styles-style2tagmap.xml
      NOTE: For more information about the structure of
      word-builtin-styles-style2tagmap.xml file and how you can customize it, see Style to Tag Mapping in DITA For Publishers User Guide.
      –	In the props2Propagate element, specify the properties that should be passed on to the DITA map. This property is required to pass on the default metadata like dc:title,dc:subject,dam:keywords,dam:category from document metadata to converted DITA assets.
      
  5. Save the w2d_io.xml file.
  6. After configuring the required parameters in the w2d_io.xml file, log into AEM and open the Assets UI.
  7. Navigate to the input folder location (wordtodita).
  8. Upload the source Word documents into this folder. For information on uploading content on DAM, see Upload existing DITA content.

      

Using the <config> </config> block, you can define one or multiple blocks of configurations for conversion. The conversion workflow gets executed and the final output in the form of a DITA topic is saved in the location specified in the <outputDir> element.

 

View solution in original post

6 Replies

Avatar

Correct answer by
Employee

AEM Guides allows you to migrate your existing Word documents (.docx) into DITA topic type docu- ments. You need to specify the input and output folder locations along with other parameters and the document gets converted into DITA document. Depending on the content, you could have a .dita file and a .ditamap file.

 

To be able to convert a Word document successfully, your document should be well structured. For example, your document should have a Title, followed by Heading 1, Heading 2, and so on. Each of the headings should have some content in it. If your document is not well structured, the process might not work as expected.

 

By default, AEM Guides uses the Word-to-DITA (Word2DITA) transformation framework. This transfor- mation depends on the style-to-tag mapping configuration file.

To be able to use the Word2DITA trans- formation successfully, you must consider the following guidelines for preparing your Word document for conversion:

 

Spoiler
NOTE: If you make any changes in the default style-to-tag mapping configuration file, then you must update and use the guidelines confirming to your updated style mapping.

 

  • Ensure that your document starts with a Title; this Title is mapped to the DITA map title. Also, the Title must be followed by some regular content.
  • After the Title, there should be Heading 1, Heading 2, and so on. Each Heading must have some content in it. The Headings are converted into new Concept type topics. The hierarchy of the gener- ated topics is as per the Heading levels in the document, for example, Heading 1 will precede Heading 2, and Heading 2 will precede Heading 3 content.
  • The document must have at least one Heading type content.
  • Ensure that you do not have any grouped images. In case you have grouped images in your docu- ment, ungroup all such images.
  • Remove all headers and footers.
  • Inline styles such as bold, italics, and underline are converted into <b>, <i>, and <u> elements.
  • All ordered and unordered lists are converted into <ol> and <ul> elements. This also applies to nested lists, lists within tables, notes, or footnotes.
  • All hyperlinks are converted into <xref>.
  • The filename of the converted files is based on the heading text followed by a file number. The file number is a sequential number based on the position of the heading text in the document. For example, if a heading text is “Sample Heading” and it is 10th heading in the document, then the resultant filename for this topic will be similar to Sample_Heading_10.dita.

Perform the following steps to convert your existing Word documents into DITA topic type document:

  1. Log into AEM and open the CRXDE Lite mode.
  2. Navigate to the default configuration file available at the following location: /libs/fmdita/config/w2d_io.xml
  3. Create an overlay node of the config folder within the apps node.
  4. Navigate to the configuration file available in the apps node: /apps/fmdita/config/w2d_io.xml
    1.   The w2d_io.xml file contains the following configurable parameters:
    2. –	In the inputDir element, specify the location of the input folder wherein your source Word documents are available. For example, if your Word documents are stored in a folder named wordtodita in projects folder, then specify the location as:
      /content/dam/projects/wordtodita/
      –	In theoutputDir element, specify the location of the output folder or keep the default output location to save the converted DITA document. If the specified output folder does not exist on DAM, then the conversion workflow creates the output folder.
      –	For the createRev element, specify whether a new version of the converted DITA topic is to be created (true) or not (false).
      –	In the s2tMap element, specify the location of the map file that contains mappings for Word document styles to DITA elements. The default mapping is stored in the file located at:
      /libs/fmdita/word2dita/word-builtin-styles-style2tagmap.xml
      NOTE: For more information about the structure of
      word-builtin-styles-style2tagmap.xml file and how you can customize it, see Style to Tag Mapping in DITA For Publishers User Guide.
      –	In the props2Propagate element, specify the properties that should be passed on to the DITA map. This property is required to pass on the default metadata like dc:title,dc:subject,dam:keywords,dam:category from document metadata to converted DITA assets.
      
  5. Save the w2d_io.xml file.
  6. After configuring the required parameters in the w2d_io.xml file, log into AEM and open the Assets UI.
  7. Navigate to the input folder location (wordtodita).
  8. Upload the source Word documents into this folder. For information on uploading content on DAM, see Upload existing DITA content.

      

Using the <config> </config> block, you can define one or multiple blocks of configurations for conversion. The conversion workflow gets executed and the final output in the form of a DITA topic is saved in the location specified in the <outputDir> element.

 

Avatar

Level 3

Thank you! I suspected we would need to update the config file to support this, but that is what I was not able to find. I will have our development team take on that action.

Avatar

Level 1

Hi vijay, 
I tried the mentioned above steps to convert the word doc to dita file, but at the destination i can't able to see any files.
below is log I can see, can you please help me in this

16.06.2023 17:19:31.132 *DEBUG* [JobHandler: /var/workflow/instances/server0/2023-06-16/word2dita_16:/content/dam/test-dita/hello.docx] com.adobe.fmdita.publishworkflow.ConvertWordTODita Adding jcr:content node in folders where it is missing
16.06.2023 17:19:31.225 *INFO* [JobHandler: /var/workflow/instances/server0/2023-06-16/word2dita_16:/content/dam/test-dita/hello.docx] com.adobe.fmdita.conversionutils.ConversionUtils Sending conversion complete event for path /content/dam/test-dita/hello.docx
16.06.2023 17:44:27.868 *ERROR* [[0:0:0:0:0:0:0:1] [1686917667866] POST /bin/referencelistener HTTP/1.1] com.adobe.fmdita.versioncontrol.VersionUtils Index 0 out of bounds for length 0
16.06.2023 17:45:58.127 *ERROR* [JobHandler: /var/workflow/instances/server0/2023-06-16/word2dita_18:/content/dam/test-dita/hello.docx] com.adobe.fmdita.publishworkflow.ConvertWordTODita Error caught : Node with path /content/dam/fmdita-outputs/hello does not exist.

Avatar

Employee

I hope you have created source and destination folders i.e "word files" and "w2d".
Refer to Migrating Word Documents using AEM Guides 
If you are still getting issues, Take the help of Adobe support.

Avatar

Level 1

It worked fine now, the issue is with the word document, as it is not structured as per dita rules.

Avatar

Employee

Can you give more insight into the problem?
Which element is not structured as per dita rules ?