Expand my Community achievements bar.

Guidelines for the Responsible Use of Generative AI in the Experience Cloud Community.
SOLVED

Security concern for AEM pages

Avatar

Level 4

We are flagged by Security team for below issue.

 

They have created a Python script which can be used to crawl any of our sites and fetch JCR data like UserID. Our sites are public sites.

Here is summary of the test, details of the test is attached here.

 

The Python project is in GitHub for your test/reference.

https://github.com/ilatypov/aem-hacker/blob/master/aem_slurper.py

$ time python3 aem_slurper.py www.manulifebank.ca 2>&1 | tee manulifebank.txt
Connecting to www.manulifebank.ca...
2020-09-28 09:23:15-0400 /content/manulife-bank/en_CA/jcr:content admin {"id": "jcr:content", "uri": "/content/manulife-bank/en_CA/jcr:content", "jcr:primaryType": "cq:PageContent", "jcr:mixinTypes": ["mix:lockable", "mix:versionable", "cq:LiveSync", "cq:PropertyLiveSyncCancelled"], "chatButton": "default", "imageSelect": "ui.icon.tab.select.value", "dateFormatField": "MMMM yyyy", "jcr:createdBy": "admin", "jcr:title": "English", "contentCategory": "none", "resourceCardType":
[...]
^C

real 1m19.645s
user 0m0.567s
sys 0m0.346s

 

Our security concern is the script can see the user ids when it crawls pages. The user ids are also available  under nodes through CRX when I check publisher as anonymous user. 

 

Question: How can I hide the user ids from external scripts to crawl..?

1 Accepted Solution

Avatar

Correct answer by
Community Advisor

you can not hide the user ids.

you can deny the access to those nodes by external systems by updating your dispatcher rules.

Apply dispatcher filters.

 

## Deny content reading by queries and prevent un-intended self DOS attacks
/0033 { /type "deny" /selectors '(feed|pages|rss|blueprint|infinity|tidy|sysview|docview|query|[0-9-]+|jcr:content)' /extension '(json|xml|html|feed)' }

/0322 { /type "deny" /suffix '(.*infinity.*|.*children.*|.*tidy.*)' }
/0323 { /type "deny" /url '.*/[.][.];/.*' }

 

check the DoS (Denial of Service) rules , security checklist for more details

https://experienceleague.adobe.com/docs/experience-manager-65/administering/security/security-checkl...

 

 

View solution in original post

4 Replies

Avatar

Correct answer by
Community Advisor

you can not hide the user ids.

you can deny the access to those nodes by external systems by updating your dispatcher rules.

Apply dispatcher filters.

 

## Deny content reading by queries and prevent un-intended self DOS attacks
/0033 { /type "deny" /selectors '(feed|pages|rss|blueprint|infinity|tidy|sysview|docview|query|[0-9-]+|jcr:content)' /extension '(json|xml|html|feed)' }

/0322 { /type "deny" /suffix '(.*infinity.*|.*children.*|.*tidy.*)' }
/0323 { /type "deny" /url '.*/[.][.];/.*' }

 

check the DoS (Denial of Service) rules , security checklist for more details

https://experienceleague.adobe.com/docs/experience-manager-65/administering/security/security-checkl...

 

 

Avatar

Level 4
Thank you for the solution Suresh. In our project I still need to expose AEM content and asssets as JSON for other applications to consume. Can you please provide me the rule just to block the Python script to crawl..?

Avatar

Community Advisor

##you can try like this

## Block Python script

RewriteRule ^.*py - [F,NC,L]

 

or

*.py

Avatar

Level 4
I tried that option by adding in Dispatcher rewrite, but did not fix the issue which I think is because the Python script is running from my local and it just calls for different .JSON files from that script and fetch data. Dispatcher have no information of that script and in access_log, I dont see any .py file information. We cant stop JSON to get rendered as we use .JSON outputs....I was thinking if there is a way to stop outputting JCR properties like CreateBy or ModifiedBy..!!