Expand my Community achievements bar.

Submissions are now open for the 2026 Adobe Experience Maker Awards.

How can I render a PDF from AEM DAM into JSON, including tables and images?

Avatar

Level 2

Hi ;

In my project, I want to convert PDFs from DAM into JSON,

I tried to add 

this dependency in core/pom.xml

===========================================

<!-- https://mvnrepository.com/artifact/org.apache.pdfbox/pdfbox -->
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>3.0.5</version>
</dependency>

==============================================

After adding this dependency, the Core module enters the installed state due to the dependency.(system/console/components)&(system/console/bundles)

I also tried to upload the PDF Box bundles in the web console. That also does not solve the issue.

Any suggestions should be appreciated.

 

===================================

When I used Tika, it gave just text content in JSON. 

2 Replies

Avatar

Level 4

Hi @Syed_Shaik ,

 

The main reason your Core bundle is stuck in the “installed” state is because PDFBox requires other libraries that are not present in AEM’s OSGi runtime. Simply uploading the PDFBox JARs in the console does not solve this problem since they are not fully OSGi-ready and also depend on additional libraries such as fontbox and commons-logging. On the other hand, Tika works out of the box in AEM because it is already included, but it only provides plain text instead of structured JSON. To fix this, you have two choices. The simpler option is to use Tika for extracting text from PDFs and then write a small custom service that formats this text into JSON. The more advanced option is to properly embed PDFBox and its required dependencies into your Core bundle using the Maven Bundle Plugin so that AEM can activate it without dependency errors. For most cases, using Tika along with a JSON conversion service is the easiest and most reliable approach, while PDFBox should only be used if you need advanced PDF parsing features like tables, coordinates, or layouts.

 
Thanks & Regards,
Vishal

 

Avatar

Community Advisor