Adobe AEM is the perfect platform to create and manage a personalized user experience in a web site or web application. Is Aem a valid content service platform as well? Gartner defines content services platforms (CSPs) as "the foundational component in an organization for the management and use of content. CSPs provide a way for employees to retrieve and work with content in a modern, seamless way across devices and organizational boundaries. As such, they are a core component of any organization’s digital workplace strategy". The keywords in this definition are "Organization" and "Employees": CSPs are generally used for internal enterprise applications used by employees. CSPs is the land of Opentext Documentum, Microsoft SharePoint, Alfresco and Box, among others. Adobe AEM is not considered a CSP. This is weird because AEM is one of the leaders in the Technical Documentation management (have you have heard about Adobe Experience Manager Guides?).
A valid definition of Adobe AEM is "Component Content Management System" (CCMS), as it manages content as individual components like text, images, instead of entire pages or Html documents. A Html code sent to a browser by Adobe AEM Sites, for example, does not actually exist in the repository but it is created on the fly, merging the Html code generated to render all the components inside an Adobe AEM page.
The primary difference between a typical CSP and a CCMS is the level of content management. A typical CSP manages content at a document level while a CCMS manages content at a component level, something more complex. A CSP gives some control over entire documents (PDF files, a PowerPoint presentation for example) but it is not granular enough to provide control because content is not created in the form of components.
This is not an article that describes how better Adobe AEM is compared to Opentext Documentum: there are projects where Apache Sling and Adobe AEM provide better results in terms of TCO, ease of implementation or performance. There are projects where Documentum excels in terms of scalability, robustness, or support of distributed architectures. As always, there is no one-size-fits-all technology for all seasons!
I strongly believe that Adobe AEM is a solid content services platform, let me tell you why!
The customer
My customer is a leading company in space propulsion, with its headquarter, near Rome, Italy. This customer designs, develops, produces, and integrates space launchers. The company offers competitive solutions for launching institutional, governmental, and commercial payloads in Earth orbit. I started working for this company more than 15 years ago as an Opentext Documentum consultant (I do admit, I am part of Generation X!). For this customer I implemented an application to manage technical documentation and another content management application, named Protocollo Informatico. Both applications are based on Opentext Documentum. Few months ago, this customer defined a newer architecture and decided to switch-off the Documentum platform.
The old application
The Protocollo Informatico application is used by about 150 users to manage fifty-thousand documents, all with the same set of metadata. This application is used to provide a unique ID to each document sent or received by several offices of this customer. Each office has its own counter to generate new IDs. An end-user can belong to more than one office and he or she can edit only the documents of the office to which he or she belongs.
The old Documentum application implements these simple requirements and provides administration tools to define rules to assign different IDs to documents managed by different offices. Users are authenticated via Active Directory. Customer requested to implement a new application using an open-source stack with a lower TCO, the total cost of ownership, than the Opentext Documentum-based application. My customer also requested, of course, to migrate existing data, users, and groups definition and to implement the same security rules.
The old client application based on Documentum Webtop
The old client application based on Documentum Webtop
The new application
I always desired to propose Adobe AEM as a generic Content Service Platform, to manage all the contents managed by a customer. The level of trust between me and my client gave me the opportunity finally to create an application based on Apache Sling 12. Apache Sling is the open-source core of Adobe AEM. The new Protocollo Informatico covers all the requirements, in fact:
It is based on open-source and standards (Apache Felix, Apache Jackrabbit OAK, Apache Sling).
Does not require a dedicated database.
It’s very easy to install and to manage.
It offers many of the functionalities of Opentext Documentum repository.
Can be integrated with Active Directory.
The security model can protect even a single node of the Java Content Repository with a specific ACL, the same granularity provided by Documentum.
The logical and physical architecture
Our architecture is composed of an Apache Sling 12 installed on a single node: is this similar to an AEM Author instance? Is it a Publish instance? This is not a web content management application, and our architecture does not require publishing data to one or more different instances, so there is not a publishing environment. We decided to not install a web cache as the Protocollo Informatico works only with authenticated sessions and data is protected by many ACLs: for these reasons, two different authenticated users will have strictly different visibility of the data. An HTTP accelerator like Varnish (a product very similar to Adobe Dispatcher), would not provide dramatic speed improvements as only a few contents are cacheable. Like the Dispatcher, an HTTP accelerator could be configured to act as a web application firewall to detect and filter malicious requests before they reach the Apache Sling instance. We decided to not use a web application firewall because the application runs inside an intranet, few objects/nodes are accessible by anonymous users and we did not expect attacks to the application from the intranet, by authenticated (and monitored) users. For these reasons, the logical and physical architecture coincide.
The new application architecture
Environment setup
Have you installed a simple Documentum repository on a single server? I did so many times and still today I do not consider it an easy task: you must install a database instance, then Documentum Content Server (the core of any Documentum architecture), the full-text index server, the web application server (usually an Apache Tomcat or a more complex, application server like Oracle Weblogic, IBM WebShpere or Red Hat JBoss EAP). Finally, you must install at least the Administration web app and the standard Rest API server. Normally this should be an easy task but sometimes it could be a nightmare considering many supported combinations (Windows, Linux, Oracle Database, SQL Server, etc) and each of these requires different binaries and specific requirements. A basic installation could require hours or days (I remember a complex installation for a web content management application based on Documentum, on many servers and that required two weeks!). Now think about Adobe AEM: how long does a basic Adobe AEM Author installation take? Not more than ten / fifteen minutes: you have just to install one supported Java Developer Kit and then start the jar file provided by Adobe. The Apache Sling installation requires less than five minutes to start. Apache Sling provides all the most important functionalities provided by a basic Documentum installation (authentication, authorization, metadata management, full-text indexing, transactions, versioning, observation, etc). Compared to Documentum on a single server, a basic AEM or Apache Sling installation requires less resources (less RAM, less CPUs). Just for these reasons, the TCO of an Adobe AEM or Apache Sling is much less than a Documentum based system.
The data model
I like to define the Apache Jackrabbit OAK as an object database. That means that every data is modeled as an object (a node, in the JCR terminology); objects are instances of classes (or node types). Every node is identified by a path and nodes are organized in a tree, like a file system, starting from a root node “/”. The standard node types hierarchy provides most of the classes needed to model applications but the developer can extend the hierarchy adding new node types. That is what Adobe did to implements Aem: Adobe created new classes, like cq:Page or cq:PageContent.
The JCR data model is very similar to that exposed by Documentum to the developer. One interesting difference is that in the JCR there are some classes that avoid specifying in a very formal detail how the class is defined. These classes, like the node type nt:unstructured, have few mandatory attributes but the developer can save some information also in new attributes never defined before. In that we could say that JCR is a schemafull database but it exposes some classes that could be schemaless. This is a cool feature and that’s why we can add completely new attributes in AEM, for example, to some specific instances of cq:Page or of nt:unstructured node. It is important to note that in AEM applications, architects or developers usually do not extend the data model, simply because they do not need it.
For the Protocollo Informatico application we defined just two new node types:
pi:sequence, subclass of oak:Unstructured, to define a counter used to generate ID for new registrations of a specific office;
pi:registration, subclass of oak:Unstructured, to store all the information related to a document received or sent by the customer
As both custom classes are sub-class of the standard JCR oak:Unstructured node type, the developer can add all the attributes he/she needs, without asking a data architect to define, optimize, and refactor the custom data model. When needed, the developer can store new data in new attributes of the pi:sequence or pi:registration, similar to the way he/she does when adding, removing, or modifying attributes of a Java class when needed. Certainly, the freedom to work with some schemaless class can generate problems: governance is always needed to avoid chaos, but in my experience a skilled developer can manage this powerful feature very well.
In a CSP application, extending the data model is a common practice and a new custom class is useful for many reasons. The most important one is to easily filter objects of a class without thinking about the path where these objects are stored: is there anything simpler than a select * from [pi:registration] where ... statement? A new class is useful to create specific indexes valid just for all the instances of that class.
There are many similarities between Documentum and Adobe AEM data model, for example Documentum Aspects and the JCR mixins. Without going into more detail, the JCR data model is at least as powerful (in my opinion more powerful) as that provided by Documentum.
For the Protocollo Informatico application, all the data read and write by users is stored under the new /data/protocollo path of the Java Content Repository. Users can add optionally one or more files to each registration. Uploaded files are stored as instances of the standard node type nt:file under a child node of the registration named allegati (Italian translation of “attachments”).
End users can manage files with the new Protocollo Informatico client and via Webdav: Apache Sling and Adobe AEM offer this alternative way to browse the database, mapping the JCR repository as a standard disk of their workstation. Below is an example of navigation of the Protocollo Informatico via the Finder on Apple macOS: a user can add files remove, copy files directly via macOS Finder on Windows Explorer file managers.
Browsing the Apache Sling repository via Webdav on a macOS
The Persistent Layer
Apache Jackrabbit OAK or Adobe AEM offer two main options for the persistence layer, the TarMK and MongoMK. The TarMK is the standard persistence mechanism used by Adobe Experience Manager. TarMK supports very high rates of both read and write throughput with zero external dependencies. MongoMK uses the MongoDB NoSQL database. The MongoMK option is used to implement High Availability Active / Active mode or to manage very large repositories leveraging the linear horizontal scalability of the MongoDB. For our customer we selected the TarMK option. TarMK’s metadata and files are always consistent. A hot backup can be implemented using any file-based backup tool without any special procedures or tools. Restore operations are easy as the backup operations. On the other hand, a Documentum backup and restore plan requires more effort and care because metadata and files are managed by two different components, the RDBMS and one or more file systems. Another point for Adobe AEM and Apache Jackrabbit: the persistent layer and the TarMK option reduce the TCO of these technologies.
The front-end application
Front-end application is based on Angular 12, Angular Material and Bootstrap. It is a modern, responsive, simple page application. In my opinion implementation costs of the front-end application are the same, regardless of the repository implementation, by Opentext Documentum or Adobe AEM / Apache Sling. In our case, the custom application consumes services exposed via the standard Apache Sling GET and POST servlets and via custom Rest services. The Angular application is stored inside the Java Content Repository, under the /content/protocollo path.
New client when requesting a new ID
New client returning a new IDNew client advanced search
New client dashboard available for the Administrator role
The business logic layer
The back-end has been implemented as a single OSGI bundle with new Sling Servlets and custom Rest services. An important point here is that there is no vendor lock-in: all the required skills come from open-source projects or open-source framework like Apache Sling, Apache Jackrabbit OAK, Apache Felix. There are tens of thousands of developers in the World with these skills, (considering that the Linkedin group on Adobe AEM I created and I manage has more than 11 thousand members). On the opposite, a platform like Documentum is not based on standards and there are less developers / system integrators available with the right skills to implement a new application based on this platform: another plus point for Adobe! Another important point is that Apache Sling and Adobe AEM are based on OSGI: any configuration change, any update of a bundle does not require a restart of the entire application and all the configurations are immediately effective. That could be considered an irrelevant characteristic but some configurations, as a simple label change configured in a Documentum properties file could require an application server restart: in our shared development environment we started the Apache Sling instance one time, months ago and until now we did not have a valid reason to make a single restart!
So, is Adobe AEM a valid Content Services Platform?
In my next article, I will describe how my colleagues and I completed the migration of data from the Opentext Documentum repository to the Apache Sling repository, then discuss the benefits of the new application and my final consideration and why I consider Adobe to be one of the most important content services platforms on the market, despite what independent analysts say.
About the author
Yuri Simione, has more than 25 years of experience in IT. He is partner of Next 2U Consulting, a little Italian company focused on content management for some, direct, large companies. Yuri is also Sales Manager, EMEA for Ultipa, a graph database vendor based in San Ramon, California. Yuri is one of the eighteen specialists selected by Adobe for its inaugural Adobe AEM Champion program. Yuri created and manages a Linkedin Group on Adobe AEM, with more than 11k members, the largest independent community of AEM developers and consultants, on Linkedin. You can follow Yuri on Linkedin and on Twitter.
Q&A
Please use this thread to ask questions relating to this article