Expand my Community achievements bar.

SOLVED

Vanity URLs x Dispatcher

Avatar

Level 2

Hi, gentlemen.

I have the following scenrio:

  • The website has hundreds of pages, and each of them need a vanity URL. This is because the content structure on AEM does not match the expected end URLs (eg.: on content tree the page will be at /content/<site>/<lang>/<product type>/<country>/<state>/<city>/<page> while the URL will be /<city>/<page> or even /<marketing-stuff>/<another-page-name>).
  • Those vanities must not contain extension (eg.: .html).

This scenario leads me to two problems that I'd like to ask for your help or advice:

  1. As I'll have new vanities each day, following the recommendation on http://dev.day.com/docs/en/cq/current/deploying/dispatcher/disp_config.html that states "depending on how restrictive your filter configuration is, you may need to explicitly register each individual vanity URL. In the following example /my/vanity/url" is not an option for me.
    Instead of (on the dispatcher) deny access to all URLs for then granting access to what should be public (eg.: /content/, /etc/clientlib/, etc, AND vanity), I'll have to open access to everything and then deny to what is sensitive.
    Do we have somewhere a list of what should be protected? Does this approach sound reasonable?

     
  2. The other problem is that I learned (by observation) that vanity'zed URLs are not cached by the dispatcher, what will be a huge problem on my scenario.
    Is it really right? Did you ever take/seen an approach to cache pages? Do you have anything to suggest to solve this problem, even if it involves development?

Having something like a list of from/to URLs on Apache to redirect to the actual AEM URL is not an option too, because this set of vanities is very volatile: new pages will come and go every day.

Thanks a lot for your time and advice!

1 Accepted Solution

Avatar

Correct answer by
Level 2

Better late than never:

 

1. correct, if you need to support vanity urls that can be anything, you need to allow all and then deny specific paths, like so:

        /filter
        {
            # to start, deny all requests (we will not handle PUT DELETE TRACE OPTIONS CONNECT PATCH with dispatcher)
            /0001 { /type "deny" /glob "*" }

            # in general allow all GET HEAD POST requests (including vanity URLs)
            /0002 { /type "allow" /glob "GET *" }
            /0003 { /type "allow" /glob "HEAD *" }
            # enable features
            /0004 { /type "allow" /glob "POST *.html*" } # allow POSTs to forms
            /0006 { /type "allow" /glob "POST *.json*" }

            # and then deny specific entries
            # deny consoles
            /0011 { /type "deny" /glob "* /admin*"  } # deny servlet engine admin
            /0012 { /type "deny" /glob "* /crx*"    } # deny content repository
            /0013 { /type "deny" /glob "* /system*" } # deny OSGi console

            # deny non-public content directories
            /0020 { /type "deny" /glob "* /apps*" } # deny apps access

            /0030 { /type "deny" /glob "* /bin*"  }

            /0040 { /type "deny" /glob "* /etc*" }
            /0041 { /type "allow" /glob "* /etc/clientlibs/*" }
            /0044 { /type "allow" /glob "* /etc/designs/*" }
            /0050 { /type "deny" /glob "* /libs*" }
            /0060 { /type "deny" /glob "* /home*" }
            /0070 { /type "deny" /glob "* /tmp*"  }
            /0080 { /type "deny" /glob "* /var*"  }

Please note that the filter rule syntax has changed in recent versions of dispatcher, so you may want to adjust the above. But the principle is the same.

2. Correct, if you use Vanity URLs heavily and want them to redirect to the final page (instead of serving "duplicate content" directly under the vanity URL), then all these requests will be hitting CQ because the redirects cannot be cached in the dispatcher. This is obviously a performance issue and does not scale, so for a modern site that wants to use vanities (liek for sales channels) the ootb features in AEm are totally unsuitable imho.

Adobe claims there is a new feature since dispatcher version 4.1.9 that allows this (it's not documented anywhere, you can only learn about it in this webinar: http://dev.day.com/content/ddc/en/gems/dispatcher-caching---new-features-and-optimizations.html ). I am disappointed with the vanity URL feature though.
1. the URL /libs/granite/dispatcher/content/vanityUrls.html depends on recent versions of CQ actually, so it is not a dispatcher feature as they claim, but a CQ feature
2. instead of just using this vanity URL list to tell the dispatcher what to allow/deny, they could have used the dispatcher to actually send the 30X redirect directly itself, instead of still sending it to CQ. An option to basically cache redirects as well in dispatcher would have been a lot better in terms of performance.

Bottom line: you need to solve this in a different way. Solution approaches I have seen:

1. do it with Apache mod_rewrite rules and build some custom tools page in CQ that allows your editors to modify and update the conf file with RewriteRules or just generate an export Apache conf file from all the content pages that have a sling vanity URL page property. Then have CQ export that file and trigger an Apache reload. It's a hack, but it works.

2. Use Varnish instead and throw away the dispatcher completely. Everything that the dispatcher can do, Varnish can do better. You can have more fine grained flushing and auto-invalidation too.

View solution in original post

2 Replies

Avatar

Correct answer by
Level 2

Better late than never:

 

1. correct, if you need to support vanity urls that can be anything, you need to allow all and then deny specific paths, like so:

        /filter
        {
            # to start, deny all requests (we will not handle PUT DELETE TRACE OPTIONS CONNECT PATCH with dispatcher)
            /0001 { /type "deny" /glob "*" }

            # in general allow all GET HEAD POST requests (including vanity URLs)
            /0002 { /type "allow" /glob "GET *" }
            /0003 { /type "allow" /glob "HEAD *" }
            # enable features
            /0004 { /type "allow" /glob "POST *.html*" } # allow POSTs to forms
            /0006 { /type "allow" /glob "POST *.json*" }

            # and then deny specific entries
            # deny consoles
            /0011 { /type "deny" /glob "* /admin*"  } # deny servlet engine admin
            /0012 { /type "deny" /glob "* /crx*"    } # deny content repository
            /0013 { /type "deny" /glob "* /system*" } # deny OSGi console

            # deny non-public content directories
            /0020 { /type "deny" /glob "* /apps*" } # deny apps access

            /0030 { /type "deny" /glob "* /bin*"  }

            /0040 { /type "deny" /glob "* /etc*" }
            /0041 { /type "allow" /glob "* /etc/clientlibs/*" }
            /0044 { /type "allow" /glob "* /etc/designs/*" }
            /0050 { /type "deny" /glob "* /libs*" }
            /0060 { /type "deny" /glob "* /home*" }
            /0070 { /type "deny" /glob "* /tmp*"  }
            /0080 { /type "deny" /glob "* /var*"  }

Please note that the filter rule syntax has changed in recent versions of dispatcher, so you may want to adjust the above. But the principle is the same.

2. Correct, if you use Vanity URLs heavily and want them to redirect to the final page (instead of serving "duplicate content" directly under the vanity URL), then all these requests will be hitting CQ because the redirects cannot be cached in the dispatcher. This is obviously a performance issue and does not scale, so for a modern site that wants to use vanities (liek for sales channels) the ootb features in AEm are totally unsuitable imho.

Adobe claims there is a new feature since dispatcher version 4.1.9 that allows this (it's not documented anywhere, you can only learn about it in this webinar: http://dev.day.com/content/ddc/en/gems/dispatcher-caching---new-features-and-optimizations.html ). I am disappointed with the vanity URL feature though.
1. the URL /libs/granite/dispatcher/content/vanityUrls.html depends on recent versions of CQ actually, so it is not a dispatcher feature as they claim, but a CQ feature
2. instead of just using this vanity URL list to tell the dispatcher what to allow/deny, they could have used the dispatcher to actually send the 30X redirect directly itself, instead of still sending it to CQ. An option to basically cache redirects as well in dispatcher would have been a lot better in terms of performance.

Bottom line: you need to solve this in a different way. Solution approaches I have seen:

1. do it with Apache mod_rewrite rules and build some custom tools page in CQ that allows your editors to modify and update the conf file with RewriteRules or just generate an export Apache conf file from all the content pages that have a sling vanity URL page property. Then have CQ export that file and trigger an Apache reload. It's a hack, but it works.

2. Use Varnish instead and throw away the dispatcher completely. Everything that the dispatcher can do, Varnish can do better. You can have more fine grained flushing and auto-invalidation too.