Convert HTML to Markdown | Community
Skip to main content
This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.
Best answer by ayushmishra07

Hi @shaheena_sheikh ,

You can use the third-party dependency 

com.vladsch.flexmark

by bundling it in the OSGi Bundle using the bnd-maven-plugin. That ways the dependency would be available as OSGi bundle. 
eg : 

 

<plugin> <groupId>biz.aQute.bnd</groupId> <artifactId>bnd-maven-plugin</artifactId> <version>${bnd.version}</version> <executions> <execution> <id>bnd-process</id> <goals> <goal>bnd-process</goal> </goals> <configuration> <bnd> <![CDATA[ Bundle-Category: ${componentGroupName} # export all versioned packages except for conditional ones (https://github.com/bndtools/bnd/issues/3721#issuecomment-579026778) -exportcontents: ${packages;VERSIONED} # adding conditional packages -conditionalpackage: com.vladsch.flexmark.html2md.converter.*,com.vladsch.flexmark.*,org.jsoup.*,javax.annotation.meta; # reproducible builds (https://github.com/bndtools/bnd/issues/3521) -noextraheaders: true -snapshot: SNAPSHOT Bundle-DocURL: -plugin org.apache.sling.caconfig.bndplugin.ConfigurationClassScannerPlugin -plugin org.apache.sling.bnd.models.ModelsScannerPlugin ]]> </bnd> </configuration> </execution> </executions> <dependencies> <dependency> <groupId>org.apache.sling</groupId> <artifactId>org.apache.sling.caconfig.bnd-plugin</artifactId> <version>1.0.2</version> </dependency> <dependency> <groupId>org.apache.sling</groupId> <artifactId>org.apache.sling.bnd.models</artifactId> <version>1.0.0</version> </dependency> <dependency> <groupId>org.apache.sling</groupId> <artifactId>scriptingbundle-maven-plugin</artifactId> <version>0.5.0</version> </dependency> </dependencies> </plugin>

 

 

2 replies

ayushmishra07
Adobe Employee
Adobe Employee
July 11, 2023

Hi @shaheena_sheikh ,

The Flexmark-java library provides you with the feature to convert your HTML to Markdown in java. 
Dependendecy  :

<dependency>
    <groupId>com.vladsch.flexmark</groupId>
    <artifactId>flexmark-html2md-converter</artifactId>
    <version>0.64.0</version>
</dependency>

Code : 

String md = FlexmarkHtmlConverter.builder().build().convert(html);

Class : 

 

com.vladsch.flexmark.html2md.converter.FlexmarkHtmlConverter

 

For Complete details about the API, please refer to the following link : 
https://github.com/vsch/flexmark-java 

Alternatively,

You can also use convert HTML to Markdown using convertHTML() methods of the Converter class of Aspose java library. 
eg : 

 

// Prepare HTML code and save it to a file String code = StringExtensions.concat("<h1>Convert HTML to Markdown Using Java</h1>", "<h2>How to Convert HTML to MD in Java</h2>", "<p>The Aspose.HTML for Java library allows you to convert HTML to Markdown.</p>"); com.aspose.html.internal.ms.System.IO.File.writeAllText("conversion.html", code); // Call convertHTML() method com.aspose.html.converters.Converter.convertHTML("conversion.html", new MarkdownSaveOptions(), Path.combine(getOutputDir(), "conversion.md"));

 

Please refer to the following link for complete details about converting HTML to Markdown using aspose : https://docs.aspose.com/html/java/convert-html-to-markdown/

In case you don't have aspose dependency in your POM, please follow these steps. 

 

1. First of all, you need to specify Aspose Cloud Maven Repository configuration/location in your Maven pom.xml as below:

<repositories>
   <repository>
       <id>AsposeJavaAPI</id>
       <name>Aspose Java API</name>
       <url>http://repository.aspose.com/repo/</url>
   </repository>
</repositories>

2. Then, add the dependency of Java API that you want to use. For example, if you are a customer of Aspose.Words for Java and want to use API version 23.4, you need to specify the following dependency in your pom.xml:

<dependency>
   <groupId>com.aspose</groupId>
   <artifactId>aspose-words</artifactId>
   <version>23.4</version>
</dependency>


Both of these libraries can help you in converting HTML to Markdown and also provide a bunch of other features on top of that. Please use the one that best serves your need.

Thanks and Regards,
Ayush

 

Level 6
July 12, 2023

Thank you for your response.

Flexmark-osgi is the only one that can be used in OSGi. Thus, flexmark-all is not compatible and wouldn't work.

What would be the license type for Apose? Is this compatible with osgi?

ayushmishra07
Adobe Employee
ayushmishra07Adobe EmployeeAccepted solution
Adobe Employee
July 12, 2023

Hi @shaheena_sheikh ,

You can use the third-party dependency 

com.vladsch.flexmark

by bundling it in the OSGi Bundle using the bnd-maven-plugin. That ways the dependency would be available as OSGi bundle. 
eg : 

 

<plugin> <groupId>biz.aQute.bnd</groupId> <artifactId>bnd-maven-plugin</artifactId> <version>${bnd.version}</version> <executions> <execution> <id>bnd-process</id> <goals> <goal>bnd-process</goal> </goals> <configuration> <bnd> <![CDATA[ Bundle-Category: ${componentGroupName} # export all versioned packages except for conditional ones (https://github.com/bndtools/bnd/issues/3721#issuecomment-579026778) -exportcontents: ${packages;VERSIONED} # adding conditional packages -conditionalpackage: com.vladsch.flexmark.html2md.converter.*,com.vladsch.flexmark.*,org.jsoup.*,javax.annotation.meta; # reproducible builds (https://github.com/bndtools/bnd/issues/3521) -noextraheaders: true -snapshot: SNAPSHOT Bundle-DocURL: -plugin org.apache.sling.caconfig.bndplugin.ConfigurationClassScannerPlugin -plugin org.apache.sling.bnd.models.ModelsScannerPlugin ]]> </bnd> </configuration> </execution> </executions> <dependencies> <dependency> <groupId>org.apache.sling</groupId> <artifactId>org.apache.sling.caconfig.bnd-plugin</artifactId> <version>1.0.2</version> </dependency> <dependency> <groupId>org.apache.sling</groupId> <artifactId>org.apache.sling.bnd.models</artifactId> <version>1.0.0</version> </dependency> <dependency> <groupId>org.apache.sling</groupId> <artifactId>scriptingbundle-maven-plugin</artifactId> <version>0.5.0</version> </dependency> </dependencies> </plugin>

 

 

Level 3
July 12, 2023

To convert HTML to Markdown in Java, you can use a library called "jsoup." Jsoup is a Java library for working with HTML, parsing, manipulating, and cleaning it. Although jsoup doesn't have built-in support for converting HTML to Markdown, you can use it to extract the HTML elements and then use another library, such as "flexmark-java," to perform the actual conversion from HTML to Markdown.

import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; import com.vladsch.flexmark.html.HtmlRenderer; import com.vladsch.flexmark.parser.Parser; import com.vladsch.flexmark.util.options.MutableDataSet; public class HtmlToMarkdownConverter { public static String convertHtmlToMarkdown(String html) { // Parse HTML using Jsoup Document doc = Jsoup.parse(html); // Select the content you want to convert (e.g., <body>) Element body = doc.body(); // Initialize flexmark-java parser and renderer MutableDataSet options = new MutableDataSet(); Parser parser = Parser.builder(options).build(); HtmlRenderer renderer = HtmlRenderer.builder(options).build(); // Convert HTML to Markdown String markdown = renderer.render(parser.parse(body.html())); return markdown; } public static void main(String[] args) { String html = "<h1>Heading</h1><p>This is a paragraph.</p><ul><li>Item 1</li><li>Item 2</li></ul>"; String markdown = convertHtmlToMarkdown(html); System.out.println(markdown); } }

Make sure to include the jsoup and flexmark-java libraries in the POM.