My project has a requirement to restrict asset uploads if an asset contains personal data. How can it be achieved on AEMaaCS
Solved! Go to Solution.
Topics help categorize Community content and increase your ability to discover relevant content.
Views
Replies
Total Likes
@vpasam You can write a simple workflow step to remove any PII data during the publishing process. So that asset delivered or consumed wont have the PII data.
Below is a sample code , you can add other properties too if you know.
/**
* The method called by the AEM Workflow Engine to perform Workflow work.
*
* @param workItem the work item representing the resource moving through the Workflow
* @param workflowSession the workflow session
* @param args arguments for this Workflow Process defined on the Workflow Model (PROCESS_ARGS, argSingle, argMulti)
* @throws WorkflowException when the Workflow Process step cannot complete. This will cause the WF to retry.
*/
@Override
public void execute(WorkItem workItem, WorkflowSession workflowSession, MetaDataMap args) throws WorkflowException {
/* Get the Workflow Payload */
// Get the Workflow data (the data that is being passed through for this work item)
final WorkflowData workflowData = workItem.getWorkflowData();
final String type = workflowData.getPayloadType();
final ResourceResolver resourceResolver = workflowSession.adaptTo(ResourceResolver.class);
// Check if the payload is a path in the JCR; The other (less common) type is JCR_UUID
if (!StringUtils.equals(type, TYPE_JCR_PATH)) {
return;
}
// Get the path to the metadata node on the JCR resource from the payload
final String path = getAssetPathFromPayload(workflowData);
log.debug("MetadataCleanup Payloadpath:: {} ", path);
Resource assetResource = resourceResolver.getResource(path);
final Resource assetMetadataRes = assetResource.getChild("jcr:content/metadata");
final ModifiableValueMap modifiableValueMap = assetMetadataRes.adaptTo(ModifiableValueMap.class);
Map<String, Object> properties = new HashMap<>();
properties.put("dc:creator", new String[] { "" });
properties.put("xmp:CreatorTool", "");
properties.put("dam:Author", "");
properties.put("dam:Producer", "");
properties.put("pdf:Producer", "");
properties.put("dc:rights", "");
properties.put("dc:Rights", "");
properties.put("photoshop:Credit", "");
final Set<Entry<String, Object>> propertyEntries = properties.entrySet();
for (final Entry<String, Object> propertyEntry : propertyEntries) {
if (modifiableValueMap.containsKey(propertyEntry.getKey())) {
modifiableValueMap.remove(propertyEntry.getKey());
}
modifiableValueMap.put(propertyEntry.getKey(), propertyEntry.getValue());
log.debug("Updating property '{}' with value '{}' for resource at path '{}'.",
propertyEntry.getKey(), propertyEntry.getValue(), assetMetadataRes.getPath());
}
commit(resourceResolver);
}
Hi @vpasam ,
There is no OOTB utility available as of now in AEM cloud service to scan PII data in assets.
There are multiple ways to minimize this risk like use third party tools for validation before uploading in aem . setup workflow approval process ( manual approval) .
Thanks
@vpasam You can write a simple workflow step to remove any PII data during the publishing process. So that asset delivered or consumed wont have the PII data.
Below is a sample code , you can add other properties too if you know.
/**
* The method called by the AEM Workflow Engine to perform Workflow work.
*
* @param workItem the work item representing the resource moving through the Workflow
* @param workflowSession the workflow session
* @param args arguments for this Workflow Process defined on the Workflow Model (PROCESS_ARGS, argSingle, argMulti)
* @throws WorkflowException when the Workflow Process step cannot complete. This will cause the WF to retry.
*/
@Override
public void execute(WorkItem workItem, WorkflowSession workflowSession, MetaDataMap args) throws WorkflowException {
/* Get the Workflow Payload */
// Get the Workflow data (the data that is being passed through for this work item)
final WorkflowData workflowData = workItem.getWorkflowData();
final String type = workflowData.getPayloadType();
final ResourceResolver resourceResolver = workflowSession.adaptTo(ResourceResolver.class);
// Check if the payload is a path in the JCR; The other (less common) type is JCR_UUID
if (!StringUtils.equals(type, TYPE_JCR_PATH)) {
return;
}
// Get the path to the metadata node on the JCR resource from the payload
final String path = getAssetPathFromPayload(workflowData);
log.debug("MetadataCleanup Payloadpath:: {} ", path);
Resource assetResource = resourceResolver.getResource(path);
final Resource assetMetadataRes = assetResource.getChild("jcr:content/metadata");
final ModifiableValueMap modifiableValueMap = assetMetadataRes.adaptTo(ModifiableValueMap.class);
Map<String, Object> properties = new HashMap<>();
properties.put("dc:creator", new String[] { "" });
properties.put("xmp:CreatorTool", "");
properties.put("dam:Author", "");
properties.put("dam:Producer", "");
properties.put("pdf:Producer", "");
properties.put("dc:rights", "");
properties.put("dc:Rights", "");
properties.put("photoshop:Credit", "");
final Set<Entry<String, Object>> propertyEntries = properties.entrySet();
for (final Entry<String, Object> propertyEntry : propertyEntries) {
if (modifiableValueMap.containsKey(propertyEntry.getKey())) {
modifiableValueMap.remove(propertyEntry.getKey());
}
modifiableValueMap.put(propertyEntry.getKey(), propertyEntry.getValue());
log.debug("Updating property '{}' with value '{}' for resource at path '{}'.",
propertyEntry.getKey(), propertyEntry.getValue(), assetMetadataRes.getPath());
}
commit(resourceResolver);
}
You can also possibly implement a sling filter which can filter your upload requests and then validate them as per your requirement.
Hi,
How can I read asset content in filter class?
You can use a custom workflow to trim/mask the metadata which might be exposing PII.
For the actual asset binary the trick is to identify the assets - post identification you can either use AEM's in-built image editor or a custom image processing library like OpenCV to blur the PII areas and then re-upload the sanitised version back to AEM.
For PDFs, libraries like Apache PDFBox or iText can be used to programmatically redact text.
For identification you can explore OCR library such as Tesseract OCR to process images and extract text for PII scanning. Once the text is extracted use regex maybe to identify the PII (Emails, Phone Numbers, Names etc.)
Similarly use libraries like Apache PDFBox (for PDFs) or Apache POI (for Word documents) to extract text from documents.
These services would have to be integrated into AEM as part of post-processing workflow to allow the asset to be processed post upload in AEM.
Hope this can help!
Views
Likes
Replies