Expand my Community achievements bar.

Dive into Adobe Summit 2024! Explore curated list of AEM sessions & labs, register, connect with experts, ask questions, engage, and share insights. Don't miss the excitement.
SOLVED

How do you prevent duplicate URLs for two pages that have the same title, but are located in different folders in Author?

Avatar

Level 2

Hi,

 

So this is a rather complex question. In our authoring environment we have it set up in a way where we have pages nested under a year and month folder structure, and so we have published pages that include the year and month in the URL. I want to shorten our URLs to simply not show year and month, just the domain and page.

 

However, how do we prevent duplicate URLs and the conflict that occurs for pages published with the same title but different folder structure in author?  I'm told that the most recently updated page will overwrite the other one for the URL path. How do I avoid that and have conflict resolution so that both pages can live on my site, but not share the same URL?  (This implies that the author doesn't know or realize that they titled a page the same. Basically, how do I prevent an author from publishing a page that could potentially have the same URL as a previously published page?)

 

Thanks!

Topics

Topics help categorize Community content and increase your ability to discover relevant content.

1 Accepted Solution

Avatar

Correct answer by
Employee Advisor

I cannot say, if the /year/month/ structure is a good one for your type of site. But the content structure is a really essential thing in AEM, and it typically translates more or less directly in the URLs of your public site. So having 2 different structures (a hierarchical on author, a flat structure on publish URLs) is a hard task, and there is not much support for that out-of-the-box.

And also many AEM practioneers will recommend not to do that, especially if it's "just" for SEO purposes; achieving this in all relevant aspects is a non-trivial task (rewriting all links, getting dispatcher caching invalidation right, ...) and requires quite some implementation work.

 

Regarding your question: You can create a filter which filters the requests for libs/wcm/core/content/sites/createpagewizard/_jcr_content (POST requests) and check the parameters "pageName" and "./jcr:title" if they are already used (probably using a query or something like that). In case it's not allowed you try to return a 412 (precondition not matched).

 

A quick search on google did not reveal any documentation or tutorial how to customize this dialog, but maybe you can start from the more generic validation approach described at [1].

 

[1] https://blogs.perficient.com/2017/11/06/aem-touch-ui-dialog-validation-new-best-practice-use-foundat...

 

 

 

 

 

View solution in original post

8 Replies

Avatar

Community Advisor

Hi @oz6616 

 

Below is my understanding from the above question:

You have a page as:

/content/project/07/2021/someurl.html

/content/project/08/2021/someurl.html

 

So both the URLs can reside in author and publish. But when you will shorten the month and year, the URL will look something like to end user:

/content/project/someurl.html

/content/project/someurl.html

 

If you see above both the URLs are same and when user will request the URL, dispatcher will not be able to identify to request resource from which URL. So the URL needs to be unique that means it cannot reside in same domain when the shortening is enabled or you might need to handle it based on query parameter where the URL will become something like below and using the redirects at the dispatcher it can be managed.

 

/content/project/someurl.html?m=07

/content/project/someurl.html?m=08

 

RewriteCond %{REQUEST_URI} ^/content/project/someurl.html
RewriteCond %{QUERY_STRING} m="07"
RewriteRule (.*) /content/project/07/2021/someurl.html [R=302,L]

RewriteCond %{REQUEST_URI} ^/content/project/someurl.html
RewriteCond %{QUERY_STRING} m="08"
RewriteRule (.*) /content/project/08/2021/someurl.html [R=302,L]

 

If you do not want to keep both the pages live on publish and want to prevent publication of the 2nd URL when the first URL is live, then you need to write a custom ReplicationPreprocessor which will be invoked each time you trigger a replication request. The logic can be to check the current page name(i.e., someurl here) and look for any occurances of the same page within the content hirearchy and if found check for the replication status of that page, if the page is replicated then stop the replication of current page, if not then proceed with replication. This will ensure only one copy is present in publish and you do not have to manage any redirects.

 

Here is a sample of Preprocessor:

https://labs.tadigital.com/index.php/2019/06/25/aem-preprocessor/

 

Hope this helps!

Thanks 

Avatar

Level 2

Thanks @Asutosh_Jena_, yes the scenario is exactly that, and I do want to prevent publishing of a second page with the same URL!

 

Out of curiosity, from what you're suggesting, with the query parameters, can the rewrite condition be done without a query parameter, but rather an append to the end of the slug?

 

First page originally published located in a prior year/month folder: /content/project/someurl.html

Different page, different content but same title in a newer folder: /content/project/someurl-08-2021.html

Avatar

Community Advisor

Hi,

you cannot map an incoming request to multiple AEM path. But what you can do is serve a page from different folder by intercepting the request using sling filter.

Sling Filter will check the request and serve the correct or latest version of page.but you have to build a mechanism to clear the cache, when there is new version of the page is availble.

if you are using ACS common then it is easy to delete cache by pubishing other content.



Arun Patidar

Avatar

Level 2

I'm aware of this, the situation I'm looking to prevent is having a newer page end up with the same URL as a previously published page. What I want is a way to flag to the author that they need to make the page path unique.

Avatar

Community Advisor

Hello @oz6616 ,

 

A single url path will not be able to resolve to two AEM content paths. See https://www.aemquickstart.in/2015/11/how-is-resource-resolution-done-in-sling_85.html  for more info. We can either try to use query parameters or selectors (preferred since this will be cached) to take the users to different page locations.

I can think of following approach:

Considering ,how are the end users going to see these pages. If the links are supposed to be showing only in a particular component or a few, we can use a common service/Util method to generate the link to go to the desired page.

We can have the links of pages generated based on the parent structure and use of selectors

For eg: say we have "/content/project/en/2021/08/titlename.html" and "/content/project/en/2021/07/titlename.html"

The backend of the component will create the display link of the above links as "/content/project/en.uniquesel.2021.08.titlename.html" and "/content/project/en.uniquesel.2021.07.titlename.html"

 

Then we can use a Sling Servlet Filter (Example) that will read the selectors, do a quick check whether the request url at [0] contains/equals 'uniquesel' and proceed. Further, create the destination path in the filter based on the request url selectors other than [0]

 

i.e "/content/project/en.uniquesel.2021.08.titlename.html" --> "/content/project/en/2021/08/titlename.html"

Hence redirecting the user to the desired destination.

 

The page structure for this may vary in reality say /content/project/en/2021/08/01/titlename.html and /content/project/en/2021/07/31/titlename.html. We will have to think of or consider what kind of page link the end user may possibly be.

 

Not sure if this serves the purpose, but the answer to question 'how important is the requirement is?' vs 'loe and adding complexity' must be considered.

Avatar

Employee Advisor

Regarding your initial question: This does not work properly, and the reasons for it has been described already by the other responses.

Also, I find it odd that you create a content structure which you are then trying to hide. The content/information structure is an inherently important piece in your AEM application design, and has impact to many aspects, see [1]. 

To your second question: You should not do that and restrict authors from naming pages as they need.

 

[1] https://cqdump.joerghoh.de/2017/11/13/creating-the-content-architecture-with-aem/

Avatar

Level 2

We created the content tree structure to make it easy for our authors not to lose where they saved their articles. However, it is a reality that many sites and blogs are now not using the "/year/month/" structure in their URL paths. It's redundant information, so I'd like to hide it to simplify our URLs and make it more SEO-friendly. At the same time, we want to keep our articles organized in some fashion and not buried on the authoring side. The problem, however, is that with many authors, the chances of producing two different pieces of content with the same title (and ultimately the same URL) is high, and I'd like to find a solution to avoid that.

 

Avatar

Correct answer by
Employee Advisor

I cannot say, if the /year/month/ structure is a good one for your type of site. But the content structure is a really essential thing in AEM, and it typically translates more or less directly in the URLs of your public site. So having 2 different structures (a hierarchical on author, a flat structure on publish URLs) is a hard task, and there is not much support for that out-of-the-box.

And also many AEM practioneers will recommend not to do that, especially if it's "just" for SEO purposes; achieving this in all relevant aspects is a non-trivial task (rewriting all links, getting dispatcher caching invalidation right, ...) and requires quite some implementation work.

 

Regarding your question: You can create a filter which filters the requests for libs/wcm/core/content/sites/createpagewizard/_jcr_content (POST requests) and check the parameters "pageName" and "./jcr:title" if they are already used (probably using a query or something like that). In case it's not allowed you try to return a 412 (precondition not matched).

 

A quick search on google did not reveal any documentation or tutorial how to customize this dialog, but maybe you can start from the more generic validation approach described at [1].

 

[1] https://blogs.perficient.com/2017/11/06/aem-touch-ui-dialog-validation-new-best-practice-use-foundat...