Expand my Community achievements bar.

Guidelines for the Responsible Use of Generative AI in the Experience Cloud Community.
SOLVED

Convert HTML to Markdown

Avatar

Level 6

How can I convert HTML to Markdown in Java? I came across Flexmark-osgi, but it converts markdown to HTML.

1 Accepted Solution

Avatar

Correct answer by
Employee

Hi @Shaheena_Sheikh ,

You can use the third-party dependency 

com.vladsch.flexmark

by bundling it in the OSGi Bundle using the bnd-maven-plugin. That ways the dependency would be available as OSGi bundle. 
eg : 

 

<plugin>
                              
          <groupId>biz.aQute.bnd</groupId>
                              
          <artifactId>bnd-maven-plugin</artifactId>
                              
          <version>${bnd.version}</version>
                              
          <executions>
                                    
            <execution>
                                          
              <id>bnd-process</id>
                                          
              <goals>
                                                
                <goal>bnd-process</goal>
                                            
              </goals>
                                          
              <configuration>
                                                
                <bnd>
                  <![CDATA[
Bundle-Category: ${componentGroupName}

# export all versioned packages except for conditional ones (https://github.com/bndtools/bnd/issues/3721#issuecomment-579026778)
-exportcontents: ${packages;VERSIONED}
# adding conditional packages
-conditionalpackage: com.vladsch.flexmark.html2md.converter.*,com.vladsch.flexmark.*,org.jsoup.*,javax.annotation.meta;
# reproducible builds (https://github.com/bndtools/bnd/issues/3521)
-noextraheaders: true
-snapshot: SNAPSHOT

Bundle-DocURL:
-plugin org.apache.sling.caconfig.bndplugin.ConfigurationClassScannerPlugin
-plugin org.apache.sling.bnd.models.ModelsScannerPlugin
                                ]]>
                </bnd>
                                            
              </configuration>
                                      
            </execution>
                                
          </executions>
                              
          <dependencies>
                                    
            <dependency>
                                          
              <groupId>org.apache.sling</groupId>
                                          
              <artifactId>org.apache.sling.caconfig.bnd-plugin</artifactId>
                                          
              <version>1.0.2</version>
                                      
            </dependency>
                                    
            <dependency>
                                          
              <groupId>org.apache.sling</groupId>
                                          
              <artifactId>org.apache.sling.bnd.models</artifactId>
                                          
              <version>1.0.0</version>
                                      
            </dependency>
                                    
            <dependency>
                                          
              <groupId>org.apache.sling</groupId>
                                          
              <artifactId>scriptingbundle-maven-plugin</artifactId>
                                          
              <version>0.5.0</version>
                                      
            </dependency>
                                
          </dependencies>
                          
        </plugin>

 

 

View solution in original post

5 Replies

Avatar

Employee

Hi @Shaheena_Sheikh ,

The Flexmark-java library provides you with the feature to convert your HTML to Markdown in java. 
Dependendecy  :

<dependency>
    <groupId>com.vladsch.flexmark</groupId>
    <artifactId>flexmark-html2md-converter</artifactId>
    <version>0.64.0</version>
</dependency>

Code : 

String md = FlexmarkHtmlConverter.builder().build().convert(html);

Class : 

 

com.vladsch.flexmark.html2md.converter.FlexmarkHtmlConverter

 

For Complete details about the API, please refer to the following link : 
https://github.com/vsch/flexmark-java 

Alternatively,

You can also use convert HTML to Markdown using convertHTML() methods of the Converter class of Aspose java library. 
eg : 

 

    // Prepare HTML code and save it to a file
    String code = StringExtensions.concat("<h1>Convert HTML to Markdown Using Java</h1>", 
                "<h2>How to Convert HTML to MD in Java</h2>", 
                "<p>The Aspose.HTML for Java library allows you to convert HTML to Markdown.</p>");
    com.aspose.html.internal.ms.System.IO.File.writeAllText("conversion.html", code);

    // Call convertHTML() method
    com.aspose.html.converters.Converter.convertHTML("conversion.html", new MarkdownSaveOptions(), Path.combine(getOutputDir(), "conversion.md"));

 

Please refer to the following link for complete details about converting HTML to Markdown using aspose : https://docs.aspose.com/html/java/convert-html-to-markdown/

In case you don't have aspose dependency in your POM, please follow these steps. 

 

1. First of all, you need to specify Aspose Cloud Maven Repository configuration/location in your Maven pom.xml as below:

<repositories>
   <repository>
       <id>AsposeJavaAPI</id>
       <name>Aspose Java API</name>
       <url>http://repository.aspose.com/repo/</url>
   </repository>
</repositories>

2. Then, add the dependency of Java API that you want to use. For example, if you are a customer of Aspose.Words for Java and want to use API version 23.4, you need to specify the following dependency in your pom.xml:

<dependency>
   <groupId>com.aspose</groupId>
   <artifactId>aspose-words</artifactId>
   <version>23.4</version>
</dependency>


Both of these libraries can help you in converting HTML to Markdown and also provide a bunch of other features on top of that. Please use the one that best serves your need.

Thanks and Regards,
Ayush

 

Avatar

Level 6

Thank you for your response.

Flexmark-osgi is the only one that can be used in OSGi. Thus, flexmark-all is not compatible and wouldn't work.

What would be the license type for Apose? Is this compatible with osgi?

Avatar

Correct answer by
Employee

Hi @Shaheena_Sheikh ,

You can use the third-party dependency 

com.vladsch.flexmark

by bundling it in the OSGi Bundle using the bnd-maven-plugin. That ways the dependency would be available as OSGi bundle. 
eg : 

 

<plugin>
                              
          <groupId>biz.aQute.bnd</groupId>
                              
          <artifactId>bnd-maven-plugin</artifactId>
                              
          <version>${bnd.version}</version>
                              
          <executions>
                                    
            <execution>
                                          
              <id>bnd-process</id>
                                          
              <goals>
                                                
                <goal>bnd-process</goal>
                                            
              </goals>
                                          
              <configuration>
                                                
                <bnd>
                  <![CDATA[
Bundle-Category: ${componentGroupName}

# export all versioned packages except for conditional ones (https://github.com/bndtools/bnd/issues/3721#issuecomment-579026778)
-exportcontents: ${packages;VERSIONED}
# adding conditional packages
-conditionalpackage: com.vladsch.flexmark.html2md.converter.*,com.vladsch.flexmark.*,org.jsoup.*,javax.annotation.meta;
# reproducible builds (https://github.com/bndtools/bnd/issues/3521)
-noextraheaders: true
-snapshot: SNAPSHOT

Bundle-DocURL:
-plugin org.apache.sling.caconfig.bndplugin.ConfigurationClassScannerPlugin
-plugin org.apache.sling.bnd.models.ModelsScannerPlugin
                                ]]>
                </bnd>
                                            
              </configuration>
                                      
            </execution>
                                
          </executions>
                              
          <dependencies>
                                    
            <dependency>
                                          
              <groupId>org.apache.sling</groupId>
                                          
              <artifactId>org.apache.sling.caconfig.bnd-plugin</artifactId>
                                          
              <version>1.0.2</version>
                                      
            </dependency>
                                    
            <dependency>
                                          
              <groupId>org.apache.sling</groupId>
                                          
              <artifactId>org.apache.sling.bnd.models</artifactId>
                                          
              <version>1.0.0</version>
                                      
            </dependency>
                                    
            <dependency>
                                          
              <groupId>org.apache.sling</groupId>
                                          
              <artifactId>scriptingbundle-maven-plugin</artifactId>
                                          
              <version>0.5.0</version>
                                      
            </dependency>
                                
          </dependencies>
                          
        </plugin>

 

 

Avatar

Level 6

Hi,

Thanks for your response. The answer makes complete sense. I will definitely try it out.

Avatar

Level 4

To convert HTML to Markdown in Java, you can use a library called "jsoup." Jsoup is a Java library for working with HTML, parsing, manipulating, and cleaning it. Although jsoup doesn't have built-in support for converting HTML to Markdown, you can use it to extract the HTML elements and then use another library, such as "flexmark-java," to perform the actual conversion from HTML to Markdown.

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import com.vladsch.flexmark.html.HtmlRenderer;
import com.vladsch.flexmark.parser.Parser;
import com.vladsch.flexmark.util.options.MutableDataSet;

public class HtmlToMarkdownConverter {

    public static String convertHtmlToMarkdown(String html) {
        // Parse HTML using Jsoup
        Document doc = Jsoup.parse(html);

        // Select the content you want to convert (e.g., <body>)
        Element body = doc.body();

        // Initialize flexmark-java parser and renderer
        MutableDataSet options = new MutableDataSet();
        Parser parser = Parser.builder(options).build();
        HtmlRenderer renderer = HtmlRenderer.builder(options).build();

        // Convert HTML to Markdown
        String markdown = renderer.render(parser.parse(body.html()));

        return markdown;
    }

    public static void main(String[] args) {
        String html = "<h1>Heading</h1><p>This is a paragraph.</p><ul><li>Item 1</li><li>Item 2</li></ul>";

        String markdown = convertHtmlToMarkdown(html);
        System.out.println(markdown);
    }
}

Make sure to include the jsoup and flexmark-java libraries in the POM.