Convert HTML to Markdown | Community
Skip to main content
This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.
Best answer by ayushmishra07

Hi @shaheena_sheikh ,

You can use the third-party dependency 

com.vladsch.flexmark

by bundling it in the OSGi Bundle using the bnd-maven-plugin. That ways the dependency would be available as OSGi bundle. 
eg : 

 

<plugin> <groupId>biz.aQute.bnd</groupId> <artifactId>bnd-maven-plugin</artifactId> <version>${bnd.version}</version> <executions> <execution> <id>bnd-process</id> <goals> <goal>bnd-process</goal> </goals> <configuration> <bnd> <![CDATA[ Bundle-Category: ${componentGroupName} # export all versioned packages except for conditional ones (https://github.com/bndtools/bnd/issues/3721#issuecomment-579026778) -exportcontents: ${packages;VERSIONED} # adding conditional packages -conditionalpackage: com.vladsch.flexmark.html2md.converter.*,com.vladsch.flexmark.*,org.jsoup.*,javax.annotation.meta; # reproducible builds (https://github.com/bndtools/bnd/issues/3521) -noextraheaders: true -snapshot: SNAPSHOT Bundle-DocURL: -plugin org.apache.sling.caconfig.bndplugin.ConfigurationClassScannerPlugin -plugin org.apache.sling.bnd.models.ModelsScannerPlugin ]]> </bnd> </configuration> </execution> </executions> <dependencies> <dependency> <groupId>org.apache.sling</groupId> <artifactId>org.apache.sling.caconfig.bnd-plugin</artifactId> <version>1.0.2</version> </dependency> <dependency> <groupId>org.apache.sling</groupId> <artifactId>org.apache.sling.bnd.models</artifactId> <version>1.0.0</version> </dependency> <dependency> <groupId>org.apache.sling</groupId> <artifactId>scriptingbundle-maven-plugin</artifactId> <version>0.5.0</version> </dependency> </dependencies> </plugin>

 

 

2 replies

ayushmishra07
Adobe Employee
Adobe Employee
July 11, 2023

Hi @shaheena_sheikh ,

The Flexmark-java library provides you with the feature to convert your HTML to Markdown in java. 
Dependendecy  :

<dependency>
    <groupId>com.vladsch.flexmark</groupId>
    <artifactId>flexmark-html2md-converter</artifactId>
    <version>0.64.0</version>
</dependency>

Code : 

String md = FlexmarkHtmlConverter.builder().build().convert(html);

Class : 

 

com.vladsch.flexmark.html2md.converter.FlexmarkHtmlConverter

 

For Complete details about the API, please refer to the following link : 
https://github.com/vsch/flexmark-java 

Alternatively,

You can also use convert HTML to Markdown using convertHTML() methods of the Converter class of Aspose java library. 
eg : 

 

// Prepare HTML code and save it to a file String code = StringExtensions.concat("<h1>Convert HTML to Markdown Using Java</h1>", "<h2>How to Convert HTML to MD in Java</h2>", "<p>The Aspose.HTML for Java library allows you to convert HTML to Markdown.</p>"); com.aspose.html.internal.ms.System.IO.File.writeAllText("conversion.html", code); // Call convertHTML() method com.aspose.html.converters.Converter.convertHTML("conversion.html", new MarkdownSaveOptions(), Path.combine(getOutputDir(), "conversion.md"));

 

Please refer to the following link for complete details about converting HTML to Markdown using aspose : https://docs.aspose.com/html/java/convert-html-to-markdown/

In case you don't have aspose dependency in your POM, please follow these steps. 

 

1. First of all, you need to specify Aspose Cloud Maven Repository configuration/location in your Maven pom.xml as below:

<repositories>
   <repository>
       <id>AsposeJavaAPI</id>
       <name>Aspose Java API</name>
       <url>http://repository.aspose.com/repo/</url>
   </repository>
</repositories>

2. Then, add the dependency of Java API that you want to use. For example, if you are a customer of Aspose.Words for Java and want to use API version 23.4, you need to specify the following dependency in your pom.xml:

<dependency>
   <groupId>com.aspose</groupId>
   <artifactId>aspose-words</artifactId>
   <version>23.4</version>
</dependency>


Both of these libraries can help you in converting HTML to Markdown and also provide a bunch of other features on top of that. Please use the one that best serves your need.

Thanks and Regards,
Ayush

 

Level 6
July 12, 2023

Thank you for your response.

Flexmark-osgi is the only one that can be used in OSGi. Thus, flexmark-all is not compatible and wouldn't work.

What would be the license type for Apose? Is this compatible with osgi?

Level 6
August 23, 2023

Hi @shaheena_sheikh ,

You can use the third-party dependency 

com.vladsch.flexmark

by bundling it in the OSGi Bundle using the bnd-maven-plugin. That ways the dependency would be available as OSGi bundle. 
eg : 

 

<plugin> <groupId>biz.aQute.bnd</groupId> <artifactId>bnd-maven-plugin</artifactId> <version>${bnd.version}</version> <executions> <execution> <id>bnd-process</id> <goals> <goal>bnd-process</goal> </goals> <configuration> <bnd> <![CDATA[ Bundle-Category: ${componentGroupName} # export all versioned packages except for conditional ones (https://github.com/bndtools/bnd/issues/3721#issuecomment-579026778) -exportcontents: ${packages;VERSIONED} # adding conditional packages -conditionalpackage: com.vladsch.flexmark.html2md.converter.*,com.vladsch.flexmark.*,org.jsoup.*,javax.annotation.meta; # reproducible builds (https://github.com/bndtools/bnd/issues/3521) -noextraheaders: true -snapshot: SNAPSHOT Bundle-DocURL: -plugin org.apache.sling.caconfig.bndplugin.ConfigurationClassScannerPlugin -plugin org.apache.sling.bnd.models.ModelsScannerPlugin ]]> </bnd> </configuration> </execution> </executions> <dependencies> <dependency> <groupId>org.apache.sling</groupId> <artifactId>org.apache.sling.caconfig.bnd-plugin</artifactId> <version>1.0.2</version> </dependency> <dependency> <groupId>org.apache.sling</groupId> <artifactId>org.apache.sling.bnd.models</artifactId> <version>1.0.0</version> </dependency> <dependency> <groupId>org.apache.sling</groupId> <artifactId>scriptingbundle-maven-plugin</artifactId> <version>0.5.0</version> </dependency> </dependencies> </plugin>

 

 


Hi,

Thanks for your response. The answer makes complete sense. I will definitely try it out.

Level 3
July 12, 2023

To convert HTML to Markdown in Java, you can use a library called "jsoup." Jsoup is a Java library for working with HTML, parsing, manipulating, and cleaning it. Although jsoup doesn't have built-in support for converting HTML to Markdown, you can use it to extract the HTML elements and then use another library, such as "flexmark-java," to perform the actual conversion from HTML to Markdown.

import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; import com.vladsch.flexmark.html.HtmlRenderer; import com.vladsch.flexmark.parser.Parser; import com.vladsch.flexmark.util.options.MutableDataSet; public class HtmlToMarkdownConverter { public static String convertHtmlToMarkdown(String html) { // Parse HTML using Jsoup Document doc = Jsoup.parse(html); // Select the content you want to convert (e.g., <body>) Element body = doc.body(); // Initialize flexmark-java parser and renderer MutableDataSet options = new MutableDataSet(); Parser parser = Parser.builder(options).build(); HtmlRenderer renderer = HtmlRenderer.builder(options).build(); // Convert HTML to Markdown String markdown = renderer.render(parser.parse(body.html())); return markdown; } public static void main(String[] args) { String html = "<h1>Heading</h1><p>This is a paragraph.</p><ul><li>Item 1</li><li>Item 2</li></ul>"; String markdown = convertHtmlToMarkdown(html); System.out.println(markdown); } }

Make sure to include the jsoup and flexmark-java libraries in the POM.