Adobe Experience Manager Sites & More

aanchal-sikka · 11/22/24

The majority of our assets are ingested through RESTful APIs, and often the metadata fails to meet compliance standards. I am seeking recommendations for frameworks or approach that can be utilized to

validate metadata via RESTful services.
perform regular content quality checks for existing assets.

Aanchal Sikka

aanchal-sikka · 11/25/24

Thanks @Tethich @Fanindra_Surat @A_H_M_Imrul @narendragandhi for your response. All the approaches seem valid for different use-cases.

Sharing my thought process, after borrowing your amazing suggestions.

Key Stages for Data Validation

Existing Data:
- Prioritize validation of severe issues that have a significant impact on metadata quality.
- Address medium or low-impact issues progressively as part of a cleanup plan.
Incoming Data:
- Implement rules and workflows to ensure compliance before data ingestion.
- Relax rules strategically when needed but maintain processes to identify and correct discrepancies later.
Evolving Rules:
- Introduce and enforce new validation rules as new business requirements, compliance needs, or metadata values emerge.

Leveraging Reports Over Time

Validation efforts should serve the needs of both technical and business teams:

Technical Teams:
- Perform full-system health checks regularly to identify gaps across all assets.
Business Teams:
- Validate and clean subsets of data in small or large batches.
- Revalidate after cleanup to ensure compliance with updated rules.
- Employ user-friendly tools to facilitate incremental cleanups without technical intervention.

Reports and Visualizations

Reports: Generate segmented reports tailored to specific teams, stakeholders, or metadata filters.
Visual Dashboards: Use tools like Power BI to create intuitive dashboards that visualize validation progress, compliance, and areas needing attention.

Technical Approaches for Data Validation

Scheduled Health Reports: Automate periodic checks using schedulers to assess overall system health.
Extend ACS Commons Reports for validate assets + generate targeted reports by query, path, or metadata filters.
External Validations before ingesting data: This would definitely have been a good approach to lighten AEM load. However, there are few challenges:
- Dependency on other team to update the rules
- We would still need an approach to identify issues in existing assets/ones updated manually/ gradual clean-up of medium-low priority data.

I am inclining towards Custom Reporting and Storage by using:

"DAM Update Asset workflow + ACS Commons Reports" for medium/low compliance issues
Rejecting assets with severe compliance issues.

Store validation results for non-compliant assets in /var nodes, enabling:

Extracting specific reports for Metadata-specific cleanup tasks.
Easy mapping to visual dashboards for better analysis

Please do share your thoughts if you see any challenges/improvements.

Aanchal Sikka

View solution in original post

Tethich · 11/22/24

Hi @aanchal-sikka

Is it possible to expand a little more the subject ? How does your current import process looks like ? How do all the parties involved in the ingestion integrate ? When and where do you expect the regular quality checks on the content to happen ? What does it mean for you that a metadata compliance is failing ? Some diagrams or screenshot also would not hurt.

aanchal-sikka · 11/22/24

@Tethich

Thanks for the queries. Sharing details below:

The import is done by AEM by reading the Asset & related metadata from a location.
Integration is a pull mechanism from AEM, by reading S3 buckets.
Checks can be at Asset Import:
- Assets to be rejected if severe compliance issue in metadata
- Assets accepted with minor compliance issues in metadata. These should be reported by regular health checks.

Non-compliance can be if metadata is from a specific set of values, format etc.

Aanchal Sikka

narendragandhi · 11/22/24

Hi @aanchal-sikka

For the regular scheduled quality check, it can be a scheduled job which can get the list of updated assets in a time period. This will perform the specified checks and then add/update a field in metadata that keeps track of the last review date.

For the API to fetch metadata of assets, you can explore this - https://developer.adobe.com/experience-cloud/experience-manager-apis/api/experimental/assets/author/

Hope this helps!

Narendra

Tethich · 11/22/24

In am thinking that you might lift some of the burden from AEM and have a separate app, maybe built with microservices, that runs periodically, checks the objects metadatas in Amazon S3 based on your criteria and marks the metadata accordingly. So that when AEM will pull the stuff it will already know if data was validated or not before ingesting it.

Some good ways to implement a validator were already posted here.

A_H_M_Imrul · 11/22/24

Hello @aanchal-sikka ,

I hope you're doing well.

Given the additional computational overhead, it might be better to perform validation or sanity checks outside of AEM, prior to asset ingestion, if feasible.
If the compliance violations occur when assets are exposed to traffic from the publisher, could we consider using an asset replication interceptor (such as a replication preprocessor) to validate and allow only compliant assets to be replicated?
To monitor faulty assets, we could set up a scheduled Sling job to generate reports that identify non-compliant entries.

Let me know if this approach makes sense

Fanindra_Surat · 11/22/24

Hi @aanchal-sikka -

My thoughts -

From what you shared below, you have a custom process setup in AEM that reads assets and metadata from a S3 location. Are you referring to the issues within this metadata that is stored separately or the ones like, xmp metadata that are extracted out of an asset?

If it is the separately managed metadata - Can you not validate or run your compliance check during the ingestion phase in your custom process?

If it is the xmp metadata - you will need to create a custom process and configure it to be invoked as part of a DAM metadata writeback workflow itself or as a separate scheduler as per the need.

Regards,

Fani

kautuk_sahni · 11/25/24

@aanchal-sikka Did you find the suggestions helpful? Please let us know if you require more information. Otherwise, please mark the answer as correct for posterity. If you've discovered a solution yourself, we would appreciate it if you could share it with the community. Thank you!

Kautuk Sahni

aanchal-sikka · 11/25/24

Thanks @Tethich @Fanindra_Surat @A_H_M_Imrul @narendragandhi for your response. All the approaches seem valid for different use-cases.

Sharing my thought process, after borrowing your amazing suggestions.

Key Stages for Data Validation

Existing Data:
- Prioritize validation of severe issues that have a significant impact on metadata quality.
- Address medium or low-impact issues progressively as part of a cleanup plan.
Incoming Data:
- Implement rules and workflows to ensure compliance before data ingestion.
- Relax rules strategically when needed but maintain processes to identify and correct discrepancies later.
Evolving Rules:
- Introduce and enforce new validation rules as new business requirements, compliance needs, or metadata values emerge.

Leveraging Reports Over Time

Validation efforts should serve the needs of both technical and business teams:

Technical Teams:
- Perform full-system health checks regularly to identify gaps across all assets.
Business Teams:
- Validate and clean subsets of data in small or large batches.
- Revalidate after cleanup to ensure compliance with updated rules.
- Employ user-friendly tools to facilitate incremental cleanups without technical intervention.

Reports and Visualizations

Reports: Generate segmented reports tailored to specific teams, stakeholders, or metadata filters.
Visual Dashboards: Use tools like Power BI to create intuitive dashboards that visualize validation progress, compliance, and areas needing attention.

Technical Approaches for Data Validation

Scheduled Health Reports: Automate periodic checks using schedulers to assess overall system health.
Extend ACS Commons Reports for validate assets + generate targeted reports by query, path, or metadata filters.
External Validations before ingesting data: This would definitely have been a good approach to lighten AEM load. However, there are few challenges:
- Dependency on other team to update the rules
- We would still need an approach to identify issues in existing assets/ones updated manually/ gradual clean-up of medium-low priority data.

I am inclining towards Custom Reporting and Storage by using:

"DAM Update Asset workflow + ACS Commons Reports" for medium/low compliance issues
Rejecting assets with severe compliance issues.

Store validation results for non-compliant assets in /var nodes, enabling:

Extracting specific reports for Metadata-specific cleanup tasks.
Easy mapping to visual dashboards for better analysis

Please do share your thoughts if you see any challenges/improvements.

Aanchal Sikka

Adobe Experience Manager Sites & More

Content quality checks

Key Stages for Data Validation

Leveraging Reports Over Time

Technical Approaches for Data Validation

Kautuk Sahni

Key Stages for Data Validation

Leveraging Reports Over Time

Technical Approaches for Data Validation

Learn

Documentation

Events

Community

Support

Resources

Adobe account

Adobe