Adobe Experience Manager Sites & More

Syed_Shaik · 9/16/25

Hi ;

In my project, I want to convert PDFs from DAM into JSON,

I tried to add

this dependency in core/pom.xml

===========================================

<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>3.0.5</version>
</dependency>

==============================================

After adding this dependency, the Core module enters the installed state due to the dependency.(system/console/components)&(system/console/bundles)

I also tried to upload the PDF Box bundles in the web console. That also does not solve the issue.

Any suggestions should be appreciated.

===================================

When I used Tika, it gave just text content in JSON.

arunpatidar · 9/17/25

Hi @Syed_Shaik

Please check https://experienceleaguecommunities.adobe.com/t5/adobe-experience-manager/facing-org-apache-pdfbox-p...

Arun Patidar

View solution in original post

VishalKa5 · 9/17/25

Hi @Syed_Shaik ,

The main reason your Core bundle is stuck in the “installed” state is because PDFBox requires other libraries that are not present in AEM’s OSGi runtime. Simply uploading the PDFBox JARs in the console does not solve this problem since they are not fully OSGi-ready and also depend on additional libraries such as fontbox and commons-logging. On the other hand, Tika works out of the box in AEM because it is already included, but it only provides plain text instead of structured JSON. To fix this, you have two choices. The simpler option is to use Tika for extracting text from PDFs and then write a small custom service that formats this text into JSON. The more advanced option is to properly embed PDFBox and its required dependencies into your Core bundle using the Maven Bundle Plugin so that AEM can activate it without dependency errors. For most cases, using Tika along with a JSON conversion service is the easiest and most reliable approach, while PDFBox should only be used if you need advanced PDF parsing features like tables, coordinates, or layouts.

Thanks & Regards,

Vishal