Below are different ways to retrieve accessibility metadata from a PDF stored in AEM DAM Dynamic Media.
1. Extract Metadata Using AEM Metadata Schema Editor
You can configure AEM Metadata Schema to expose accessibility-related metadata fields.
Steps:
- Navigate to Metadata Schemas:
- Go to AEM → Tools → Assets → Metadata Schemas.
- Edit the Default Metadata Schema:
- Select an existing metadata schema (e.g., default).
- Click Edit.
- Add a New Metadata Field for Accessibility:
- Click Add Field.
- Set Property Name to "pdfuaid:part" (for PDF/UA compliance).
- Set Type to "Text" or "Dropdown" (Yes/No for PDF/UA compliance).
- Save and apply to PDF assets.
- View Metadata for a PDF:
Go to AEM Assets.
Select a PDF file and click Properties.
Navigate to the Metadata tab to see pdfuaid:part and other accessibility metadata.
2. Extract Metadata Using CRXDE Lite
AEM stores metadata in JCR (Java Content Repository). You can check PDF metadata in CRXDE Lite.
Steps:
- Open CRXDE Lite:
- Navigate to http://<aem-instance>:4502/crx/de.
- Locate the PDF in the DAM:
- Go to /content/dam/<your-folder>/<pdf-file>.pdf/jcr:content/metadata.
- Check Accessibility Metadata Fields:
- Look for fields like:
dc:title (Title)
dc:language (Document Language)
pdfuaid:part (PDF/UA Compliance)
pdf:Tagged (Tagged PDF: true/false)
3. Retrieve PDF Metadata via AEM Query Builder API
AEM provides an API to query metadata, which you can use to extract accessibility information.
Example Query to Find Accessible PDFs
Use AEM Query Builder at:
http://<aem-instance>:4502/libs/cq/search/content/querydebug.html
Enter this query to list all PDF/UA-compliant PDFs:
path=/content/dam
type=dam:Asset
property=jcr:content/metadata/pdfuaid:part
property.value=1
p.limit=-1
This finds all PDFs that have the metadata field pdfuaid:part="1" (meaning PDF/UA-1 compliant).
4. Pull Metadata Using AEM API (cURL)
You can use cURL to extract metadata for a PDF asset:
curl -u admin:admin -X GET http://<aem-instance>:4502/api/assets/<path-to-pdf>.json
It will return metadata in JSON format, including:
{
"dc:title": "Example PDF",
"dc:language": "en",
"pdfuaid:part": "1",
"pdf:Tagged": "true"
}
5. Automate Metadata Extraction Using a Custom AEM Workflow
If you want automated accessibility metadata extraction, you can create a custom workflow that:
- Extracts PDF metadata using Adobe Acrobat APIs.
- Updates metadata fields in AEM DAM.
Example Workflow Steps:
- Trigger Workflow on PDF Upload.
- Use Apache PDFBox or Acrobat SDK to extract metadata.
- Write extracted metadata to AEM's JCR.