How can I convert HTML to Markdown in Java? I came across Flexmark-osgi, but it converts markdown to HTML.
Solved! Go to Solution.
Views
Replies
Total Likes
Hi @Shaheena_Sheikh ,
You can use the third-party dependency
com.vladsch.flexmark
by bundling it in the OSGi Bundle using the bnd-maven-plugin. That ways the dependency would be available as OSGi bundle.
eg :
<plugin>
<groupId>biz.aQute.bnd</groupId>
<artifactId>bnd-maven-plugin</artifactId>
<version>${bnd.version}</version>
<executions>
<execution>
<id>bnd-process</id>
<goals>
<goal>bnd-process</goal>
</goals>
<configuration>
<bnd>
<![CDATA[
Bundle-Category: ${componentGroupName}
# export all versioned packages except for conditional ones (https://github.com/bndtools/bnd/issues/3721#issuecomment-579026778)
-exportcontents: ${packages;VERSIONED}
# adding conditional packages
-conditionalpackage: com.vladsch.flexmark.html2md.converter.*,com.vladsch.flexmark.*,org.jsoup.*,javax.annotation.meta;
# reproducible builds (https://github.com/bndtools/bnd/issues/3521)
-noextraheaders: true
-snapshot: SNAPSHOT
Bundle-DocURL:
-plugin org.apache.sling.caconfig.bndplugin.ConfigurationClassScannerPlugin
-plugin org.apache.sling.bnd.models.ModelsScannerPlugin
]]>
</bnd>
</configuration>
</execution>
</executions>
<dependencies>
<dependency>
<groupId>org.apache.sling</groupId>
<artifactId>org.apache.sling.caconfig.bnd-plugin</artifactId>
<version>1.0.2</version>
</dependency>
<dependency>
<groupId>org.apache.sling</groupId>
<artifactId>org.apache.sling.bnd.models</artifactId>
<version>1.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.sling</groupId>
<artifactId>scriptingbundle-maven-plugin</artifactId>
<version>0.5.0</version>
</dependency>
</dependencies>
</plugin>
Hi @Shaheena_Sheikh ,
The Flexmark-java library provides you with the feature to convert your HTML to Markdown in java.
Dependendecy :
<dependency> <groupId>com.vladsch.flexmark</groupId> <artifactId>flexmark-html2md-converter</artifactId> <version>0.64.0</version> </dependency>
Code :
String md = FlexmarkHtmlConverter.builder().build().convert(html);
Class :
com.vladsch.flexmark.html2md.converter.FlexmarkHtmlConverter
For Complete details about the API, please refer to the following link :
https://github.com/vsch/flexmark-java
Alternatively,
You can also use convert HTML to Markdown using convertHTML() methods of the Converter class of Aspose java library.
eg :
// Prepare HTML code and save it to a file
String code = StringExtensions.concat("<h1>Convert HTML to Markdown Using Java</h1>",
"<h2>How to Convert HTML to MD in Java</h2>",
"<p>The Aspose.HTML for Java library allows you to convert HTML to Markdown.</p>");
com.aspose.html.internal.ms.System.IO.File.writeAllText("conversion.html", code);
// Call convertHTML() method
com.aspose.html.converters.Converter.convertHTML("conversion.html", new MarkdownSaveOptions(), Path.combine(getOutputDir(), "conversion.md"));
Please refer to the following link for complete details about converting HTML to Markdown using aspose : https://docs.aspose.com/html/java/convert-html-to-markdown/
In case you don't have aspose dependency in your POM, please follow these steps.
1. First of all, you need to specify Aspose Cloud Maven Repository configuration/location in your Maven pom.xml as below:
<repositories> <repository> <id>AsposeJavaAPI</id> <name>Aspose Java API</name> <url>http://repository.aspose.com/repo/</url> </repository> </repositories>
2. Then, add the dependency of Java API that you want to use. For example, if you are a customer of Aspose.Words for Java and want to use API version 23.4, you need to specify the following dependency in your pom.xml:
<dependency> <groupId>com.aspose</groupId> <artifactId>aspose-words</artifactId> <version>23.4</version> </dependency>
Both of these libraries can help you in converting HTML to Markdown and also provide a bunch of other features on top of that. Please use the one that best serves your need.
Thanks and Regards,
Ayush
Thank you for your response.
Flexmark-osgi is the only one that can be used in OSGi. Thus, flexmark-all is not compatible and wouldn't work.
What would be the license type for Apose? Is this compatible with osgi?
Hi @Shaheena_Sheikh ,
You can use the third-party dependency
com.vladsch.flexmark
by bundling it in the OSGi Bundle using the bnd-maven-plugin. That ways the dependency would be available as OSGi bundle.
eg :
<plugin>
<groupId>biz.aQute.bnd</groupId>
<artifactId>bnd-maven-plugin</artifactId>
<version>${bnd.version}</version>
<executions>
<execution>
<id>bnd-process</id>
<goals>
<goal>bnd-process</goal>
</goals>
<configuration>
<bnd>
<![CDATA[
Bundle-Category: ${componentGroupName}
# export all versioned packages except for conditional ones (https://github.com/bndtools/bnd/issues/3721#issuecomment-579026778)
-exportcontents: ${packages;VERSIONED}
# adding conditional packages
-conditionalpackage: com.vladsch.flexmark.html2md.converter.*,com.vladsch.flexmark.*,org.jsoup.*,javax.annotation.meta;
# reproducible builds (https://github.com/bndtools/bnd/issues/3521)
-noextraheaders: true
-snapshot: SNAPSHOT
Bundle-DocURL:
-plugin org.apache.sling.caconfig.bndplugin.ConfigurationClassScannerPlugin
-plugin org.apache.sling.bnd.models.ModelsScannerPlugin
]]>
</bnd>
</configuration>
</execution>
</executions>
<dependencies>
<dependency>
<groupId>org.apache.sling</groupId>
<artifactId>org.apache.sling.caconfig.bnd-plugin</artifactId>
<version>1.0.2</version>
</dependency>
<dependency>
<groupId>org.apache.sling</groupId>
<artifactId>org.apache.sling.bnd.models</artifactId>
<version>1.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.sling</groupId>
<artifactId>scriptingbundle-maven-plugin</artifactId>
<version>0.5.0</version>
</dependency>
</dependencies>
</plugin>
Hi,
Thanks for your response. The answer makes complete sense. I will definitely try it out.
To convert HTML to Markdown in Java, you can use a library called "jsoup." Jsoup is a Java library for working with HTML, parsing, manipulating, and cleaning it. Although jsoup doesn't have built-in support for converting HTML to Markdown, you can use it to extract the HTML elements and then use another library, such as "flexmark-java," to perform the actual conversion from HTML to Markdown.
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import com.vladsch.flexmark.html.HtmlRenderer;
import com.vladsch.flexmark.parser.Parser;
import com.vladsch.flexmark.util.options.MutableDataSet;
public class HtmlToMarkdownConverter {
public static String convertHtmlToMarkdown(String html) {
// Parse HTML using Jsoup
Document doc = Jsoup.parse(html);
// Select the content you want to convert (e.g., <body>)
Element body = doc.body();
// Initialize flexmark-java parser and renderer
MutableDataSet options = new MutableDataSet();
Parser parser = Parser.builder(options).build();
HtmlRenderer renderer = HtmlRenderer.builder(options).build();
// Convert HTML to Markdown
String markdown = renderer.render(parser.parse(body.html()));
return markdown;
}
public static void main(String[] args) {
String html = "<h1>Heading</h1><p>This is a paragraph.</p><ul><li>Item 1</li><li>Item 2</li></ul>";
String markdown = convertHtmlToMarkdown(html);
System.out.println(markdown);
}
}
Make sure to include the jsoup and flexmark-java libraries in the POM.
Views
Likes
Replies