Hi everyone,
I’m working on implementing structured data markup for paywalled content on AEM as a Cloud Service, and I’d like to add a verification mechanism to ensure that requests claiming to be from Googlebot are legitimate (i.e., through reverse DNS IP verification as described in Google's documentation https://mfpchiesi.atlassian.net/browse/MP-3153 https://developers.google.com/search/docs/crawling-indexing/verifying-googlebot?hl=en ).
Before going further, I wanted to ask:
Has anyone implemented IP verification for Googlebot within AEM as a Cloud Service?
Is it necessary to configure this in collaboration with Dispatcher rules, or can it be handled entirely within AEM?
If both systems (Dispatcher + AEM) are involved, what’s the best practice to ensure smooth communication between them? For example, how would you share or pass verification results between the dispatcher layer and AEM?
Any examples, guidance, or lessons learned would be greatly appreciated!
Thanks in advance,
Adriana
Solved! Go to Solution.
Views
Replies
Total Likes
There’s no out-of-the-box AEM module for Googlebot verification, but it can be custom-implemented using a Sling Filter or Servlet Filter in AEM.
The solution works as follows:
1. Intercept Targeted Requests
The filter applies to paths such as:
`/content/...`
Structured data endpoints (e.g., `/page.structure-data.json`)
2. Perform Reverse DNS Lookup
Extract the client IP from `request.getRemoteAddr()`
Do a reverse DNS lookup to get the hostname
Check if the hostname ends with:
`.googlebot.com`
`.google.com`
3. Perform Forward DNS Lookup
Resolve the hostname obtained above
Verify that the original IP is one of the resolved addresses
4. Flag the Request
If both checks pass, mark the request with a flag:
request.setAttribute("isVerified", true)
Otherwise, set it to `false`
This flag can then be used in downstream logic — such as structured data components — to control what’s exposed to legitimate bots vs regular users.
There’s no out-of-the-box AEM module for Googlebot verification, but it can be custom-implemented using a Sling Filter or Servlet Filter in AEM.
The solution works as follows:
1. Intercept Targeted Requests
The filter applies to paths such as:
`/content/...`
Structured data endpoints (e.g., `/page.structure-data.json`)
2. Perform Reverse DNS Lookup
Extract the client IP from `request.getRemoteAddr()`
Do a reverse DNS lookup to get the hostname
Check if the hostname ends with:
`.googlebot.com`
`.google.com`
3. Perform Forward DNS Lookup
Resolve the hostname obtained above
Verify that the original IP is one of the resolved addresses
4. Flag the Request
If both checks pass, mark the request with a flag:
request.setAttribute("isVerified", true)
Otherwise, set it to `false`
This flag can then be used in downstream logic — such as structured data components — to control what’s exposed to legitimate bots vs regular users.
Views
Likes
Replies
Views
Likes
Replies