We are flagged by Security team for below issue.
They have created a Python script which can be used to crawl any of our sites and fetch JCR data like UserID. Our sites are public sites.
Here is summary of the test, details of the test is attached here.
The Python project is in GitHub for your test/reference.
https://github.com/ilatypov/aem-hacker/blob/master/aem_slurper.py
$ time python3 aem_slurper.py www.manulifebank.ca 2>&1 | tee manulifebank.txt
Connecting to www.manulifebank.ca...
2020-09-28 09:23:15-0400 /content/manulife-bank/en_CA/jcr:content admin {"id": "jcr:content", "uri": "/content/manulife-bank/en_CA/jcr:content", "jcr:primaryType": "cq:PageContent", "jcr:mixinTypes": ["mix:lockable", "mix:versionable", "cq:LiveSync", "cq:PropertyLiveSyncCancelled"], "chatButton": "default", "imageSelect": "ui.icon.tab.select.value", "dateFormatField": "MMMM yyyy", "jcr:createdBy": "admin", "jcr:title": "English", "contentCategory": "none", "resourceCardType":
[...]
^C
real 1m19.645s
user 0m0.567s
sys 0m0.346s
Our security concern is the script can see the user ids when it crawls pages. The user ids are also available under nodes through CRX when I check publisher as anonymous user.
Question: How can I hide the user ids from external scripts to crawl..?