Thanks @tethich @fanindra_surat @a_h_m_imrul @narendragandhi for your response. All the approaches seem valid for different use-cases.
Sharing my thought process, after borrowing your amazing suggestions.
Key Stages for Data Validation
-
Existing Data:
- Prioritize validation of severe issues that have a significant impact on metadata quality.
- Address medium or low-impact issues progressively as part of a cleanup plan.
-
Incoming Data:
- Implement rules and workflows to ensure compliance before data ingestion.
- Relax rules strategically when needed but maintain processes to identify and correct discrepancies later.
-
Evolving Rules:
- Introduce and enforce new validation rules as new business requirements, compliance needs, or metadata values emerge.
Leveraging Reports Over Time
Validation efforts should serve the needs of both technical and business teams:
- Technical Teams:
- Perform full-system health checks regularly to identify gaps across all assets.
- Business Teams:
- Validate and clean subsets of data in small or large batches.
- Revalidate after cleanup to ensure compliance with updated rules.
- Employ user-friendly tools to facilitate incremental cleanups without technical intervention.
Reports and Visualizations
- Reports: Generate segmented reports tailored to specific teams, stakeholders, or metadata filters.
- Visual Dashboards: Use tools like Power BI to create intuitive dashboards that visualize validation progress, compliance, and areas needing attention.
Technical Approaches for Data Validation
-
Scheduled Health Reports: Automate periodic checks using schedulers to assess overall system health.
-
Extend ACS Commons Reports for validate assets + generate targeted reports by query, path, or metadata filters.
-
External Validations before ingesting data: This would definitely have been a good approach to lighten AEM load. However, there are few challenges:
I am inclining towards Custom Reporting and Storage by using:
- "DAM Update Asset workflow + ACS Commons Reports" for medium/low compliance issues
- Rejecting assets with severe compliance issues.
Store validation results for non-compliant assets in /var nodes, enabling:
- Extracting specific reports for Metadata-specific cleanup tasks.
- Easy mapping to visual dashboards for better analysis
Please do share your thoughts if you see any challenges/improvements.