Expand my Community achievements bar.

Pull metadata for an accessible PDF in DAM

Avatar

Level 2

Below are different ways to retrieve accessibility metadata from a PDF stored in AEM DAM Dynamic Media.

1. Extract Metadata Using AEM Metadata Schema Editor

You can configure AEM Metadata Schema to expose accessibility-related metadata fields.

Steps:

  1. Navigate to Metadata Schemas:
    • Go to AEM → Tools → Assets → Metadata Schemas.
  2. Edit the Default Metadata Schema:
    • Select an existing metadata schema (e.g., default).
    • Click Edit.
  3. Add a New Metadata Field for Accessibility:
    • Click Add Field.
    • Set Property Name to "pdfuaid:part" (for PDF/UA compliance).
    • Set Type to "Text" or "Dropdown" (Yes/No for PDF/UA compliance).
    • Save and apply to PDF assets.
  4. View Metadata for a PDF:
    Go to AEM Assets.
    Select a PDF file and click Properties.
    Navigate to the Metadata tab to see pdfuaid:part and other accessibility metadata.


2. Extract Metadata Using CRXDE Lite

AEM stores metadata in JCR (Java Content Repository). You can check PDF metadata in CRXDE Lite.

Steps:

- Open CRXDE Lite:

- Navigate to http://<aem-instance>:4502/crx/de.

- Locate the PDF in the DAM:

- Go to /content/dam/<your-folder>/<pdf-file>.pdf/jcr:content/metadata.

- Check Accessibility Metadata Fields:

- Look for fields like:

      dc:title (Title)

      dc:language (Document Language)

      pdfuaid:part (PDF/UA Compliance)

      pdf:Tagged (Tagged PDF: true/false)

3. Retrieve PDF Metadata via AEM Query Builder API

AEM provides an API to query metadata, which you can use to extract accessibility information.

Example Query to Find Accessible PDFs

Use AEM Query Builder at:

http://<aem-instance>:4502/libs/cq/search/content/querydebug.html

Enter this query to list all PDF/UA-compliant PDFs:

path=/content/dam

type=dam:Asset

property=jcr:content/metadata/pdfuaid:part

property.value=1

p.limit=-1

 

This finds all PDFs that have the metadata field pdfuaid:part="1" (meaning PDF/UA-1 compliant).

4. Pull Metadata Using AEM API (cURL)

You can use cURL to extract metadata for a PDF asset:

curl -u admin:admin -X GET http://<aem-instance>:4502/api/assets/<path-to-pdf>.json

It will return metadata in JSON format, including:

{

  "dc:title": "Example PDF",

  "dc:language": "en",

  "pdfuaid:part": "1",

  "pdf:Tagged": "true"

}

5. Automate Metadata Extraction Using a Custom AEM Workflow

If you want automated accessibility metadata extraction, you can create a custom workflow that:

  • Extracts PDF metadata using Adobe Acrobat APIs.
  • Updates metadata fields in AEM DAM.

Example Workflow Steps:

  1. Trigger Workflow on PDF Upload.
  2. Use Apache PDFBox or Acrobat SDK to extract metadata.
  3. Write extracted metadata to AEM's JCR.
Topics

Topics help categorize Community content and increase your ability to discover relevant content.

1 Reply

Avatar

Level 2

Thanks, It is very Informative.

Can you explain a little about "Write extracted metadata to AEM's JCR." how we can achieve it via a workflow, or it needs to be done manually.