In the past, the default dispatcher configuration was that any query parameter circumvents the dispatcher cache. To combat that, developers could configure particular query parameters to be ignored (things like utm_source, etc).
I'm not sure I like this. Sure, it increases the default cache hit ratio, but that was something we already accomplish in the past by configuring query parameters to be ignored. My concerns with this change are:
It requires the dispatcher configs to be updated if a new query string is added to code, and that is not expected by most developers
It does not prevent DDOS attacks (which I've seen Adobe purport that it does) - it just reduces which query params can be used to attack
It opens the door (or at least seems to) to very serious security concerns
As a quick example for bullet #3, consider a service that confirms a user signup with an emailed link that includes a UUID pointing to the registration. The first person to click the link works fine. The second person to click the link ends up not only failing to complete registration, but the seeing whatever the cached result from user 1 may have been (which could include account information)
For bullet #1, yes devs can be trained and it's not that big a deal to update the dispatcher since the code sits in the same codebase, but I really am wary of any setup where security (bullet #3) is breached by default unless the developer does actually remember.
I think I'd be less concerned with this change if the dispatcher didn't pass through any ignored query parameters to the publish server. That way any functionality based on query parameter would fail for all users including the first user, making issues easier to catch in Staging.
Curious what others' thoughts are. Am I missing something?
As we were building the tool we also acknowledged that not all of these rules are going to work for everyone, so we created an extension mechanism that can be used to fine tune the rule set (or replace it altogether) depending on your specific needs. ... This extension mechanism, along with other features of the DOT, is covered in a lab-format exercise we put together to get folks comfortable with the tool: https://github.com/adobe/aem-dispatcher-experiments/tree/main/experiments/optimizer
That's awesome that there are some built in ways to overcome the default validation rules! I think that definitely allows us to have our teams act in a way that we feel is more secure by default. That said, I still have a concern that many teams will follow through on the default rules and open a potential security hole that will ultimately put their teams to blame.
We’re open to ideas on how we can make this easier. Currently, the archetype supports the `mvn install -PautoInstallSinglePackage -PautoInstallSinglePackagePublish` flags which will deploy packages to both author and publish instances in a single command. Would an official Docker container facilitate running a local dispatcher? Are there other ways we can make the use of the dispatcher easier and more commonplace for day-to-day development?
We've developed Vagrant and Docker setups internally, but it's clear to me that it's just not something the developer community wants. Even if a full build can push to both servers at the same time, most devs are using quicker methods to deploy code (particularly FE code) without maven builds. And even if mechanisms are available to sync FE code to both AEM servers simultaneously, and the dispatcher configs as well, there's still a reality of dealing with 3 servers with 3 sync processes (i.e. 3 points of failure) that will inevitably experience random issues. That's not to mention the need to keep content in sync across author/pub as well. Again, I get that "by the book" developers should be using a full 3-server setup, but my experience in the field is that double clicking an author jar is just far, far too tempting for all but the most devout.
An additional thought on your example: if this service were implemented as a servlet (such as this Create Servlet example, although using "sling.servlet.methods=get" instead), it's worth noting that responses to requests without extensions will still not be cached by the dispatcher - regardless of the configuration of the ignoreUrlParams rule.
Though this is a valid implementation recommendation, it doesn't address my concern with "secure by default" and in that way is very similar to the recommendation to add a "no-cache" header to the service.
It seems we may have to "agree to disagree" on some of these topics. That said, what are your thoughts on why the dispatcher forwards query parameters ignored for caching purposes to the AEM server? Given that it only forwards the parameter onto the AEM server in the case of a non-cached request, it means that the system is indeterminate (in the end user's perspective) as to whether a query parameter will or wont be used on a given request. If URL parameters ignored for caching purposes were also stripped from the AEM request, such that they NEVER get to AEM, then I can see merit in this new default rule for /ignoreUrlParams.