Adobe Experience Platform Data Collection

Alexis_Cazes_ · 1/19/22

This post is part of a series about auto-tagging

While using a tag management system (TMS) like Adobe Launch, you can use different tagging methodologies to achieve your requirements. There are 3 main concepts of tagging which are:

DOM scraping which uses the functionalities of the DOM API to gather the data from the web pages. While it is a fast and flexible approach, it is also fragile as any changes in the DOM can break your implementation without notice.
Direct Call Rule which involves calling the rule/tag directly from your platform code. This approach is less flexible than the DOM scraping one. It is more robust and allows you to streamline your implementation. One of the main issues with DCR implementation is that in most instances there is no defined data structure and it also requires you to use the reference to the TMS object in your platform source code which will become a huge technical debt.
Data Layer which allows you to define a JavaScript object which will contain a structured version of the data you need to collect. While it is in the same spirit as the DCR implementation, it has the benefits to not make any reference to the TMS object which removes the technical debt issue. You can also define a well structured object to meet all of your tagging needs.

What is a data layer?

A data layer is a data structure which allows you to categorize, capture and display the data in the most efficient way. As part of the tagging effort, you will be required to deploy analytics, personalization and marketing tags. In most cases each of these products will be provided by a different third party, and each of them will require the data to be passed in a specific format.

The data layer will allow you to expose client-side the details that will be required for the tagging implementation to work in the most efficient way.

It is stored in a JavaScript object that you can access at the window level. Each provider, may it be Google or Adobe, will have their own definition of the data layer. While it fits the requirements for the specific provider it will most likely not be flexible or portable for other providers. For this reason, you should define your own data layer definition.

Flat vs nested data layer

Data Layer structures come in different forms and shapes, some more complex than others. You can choose between two distinct types of data layers.

A flat data layer consists of a JavaScript object where each property is not an object. It can be a string or a number or a boolean or an array. Each properties are on the same level and when you access the data in your code, you just need to use window.myDataLayer.property1. While it seems that a flat data layer will be simpler to implement, it is in fact harder to maintain and consume. Unlike a nested data layer, you will need to keep in mind that each property needs to have a detailed naming.

i.e: for each page data entry point you will need to add page to the name of the property to understand that the detail is related to the page. So pageName, pageURL, pageQueryString.

A flat data layer does not have a concept of object-oriented programming so you cannot group data in data category type easily. It becomes harder to check the data layer state the more the data layer structure grows.

Unlike flat data layer, nested data layer is based on object-oriented programming. A property can be of any type and it is easier to group data in specific categories. It is also easier to define, maintain and extend a nested data layer. As you will define objects, it is therefore easier to add a new property to this object rather than figuring out if a property already exists to capture this data somewhere in the flat data layer.

//Flat
var digitalData = {
    pageName: "My page",
    pageUrl: "www.domain.com/something",
    pageQuery: "?param1=value",
    brand: "Brand1",
    userAuthenticated: true,
    userId: "111",
    userType: "Type1",
    siteSection: "Section1",
    siteBusinessArea: "Area1"
}

//Nested
var digitalData = {
    page: {
        name: "My Page",
        url: "www.domain.com/something",
        query: {
            string: "?param1=value",
            params: [{
                name: "param1",
                value: "value1"
            }]
        }
    },
    site: {
        brand: "Brand1",
        section: "Section1",
        business: {
            area: "Area1"
        }
    },
    user: {
        authenticated: true,
        id: 111,
        type: "Type1"
    }
}

I will advise to always use a nested data layer. We want to be able to easily maintain and extend our data layer. We also want to easily inspect our data layer in the developer console. A nested data layer will allow you to navigate directly to the data type you want to validate instead of going through the list of all properties in the data layer object. So if you want to check all the details about page, then you simply need to navigate to this object in the data layer and you are sure you will not miss any data.

Array vs Object data layer root

For the nested data layers, you can either choose your data layer to be an array of objects or an object. If you have searched for data layer definition previously you would have seen both approaches. While both are viable you just need to consider which approach suits you best.

If you choose the array approach, you will have to push the persisting and the event data together each time an event/action happens. This means you will need to store the data somewhere so it can persist from one event to another one.

If you choose the root to be an object then you will need to make sure that each time a new action/event happens, all properties present in the object are either updated or removed as required. In this approach you can persist data directly in the data layer object and only remove them for specific logic.

i.e: if you have an application object, you need to make sure to remove it from your data layer if previous event was an APPLICATION COMPLETE. This means that new event should not have any application data present, failing to do so will inflate your application complete metrics.

You will also need to have a property that will be an array. This will be used as a notification layer so that you can easily watch for any changes in your data layer.

Define a generic data layer

Now that I have explained the different types of data layer, I will now explain how to achieve a generic definition of a data layer.

One of the main mistake in medium to large companies is the lack of a unique data layer definition across the company. In most cases each business area will operate in their own development cycle with their own development team. When you as a tagging member will request a data layer, they will fulfill your request but it is unlikely they will use the same property naming of same values for same outcome.

One simple example which I experienced was for the same platform but different product type, one cell put the applicationStatusCode as COMPLETED and another one as COMPLETE. In this instance, this meant that for one cell the implementation would have had application complete metrics in analytics while the other did not as COMPLETE was not supported in the tagging logic.

By defining a generic data layer definition across your company, it will allow you to achieve better data quality, efficient tagging and in the long run you can implement auto-tagging across the different business sections as same data structure will be expected for specific actions/events on the website.

Naming convention

Let’s define some data layer naming convention out of the box.

snake_case vs camelCase : I always prefer camelCase naming convention and that what we will use for the names of our properties.
do not repeat category name inside nested object. When using nested data later type it is not necessary to repeat the data category type in the object properties. For example, if you have an object as page, there is no need to use pageName inside the page property. Simply use name for the nested property and in your code using page.name is cleaner than using page.pageName as we already know we are navigating through the page object.
do not use leading underscore for property name
make property names as generic as possible, we want to reuse them across all your platforms. So do not name them related to the platform.

Use JSON schema definition to define your data layer

One of the main challenges that you will faces is to choose the right way to share the data layer definition across your company.

In my early attempts I used Confluence to document my data layer definition. While it worked initially it became soon really complex to maintain as I had to update multiple pages when adding a property (I used one page per object, so I needed to update the child then all of its parents each time).

I then stumbled upon JSON Schema Definition. This will allow you to define your data layer in a detailed and logical manner. You can provide the definition directly to your developer that should easily understand what is required, what are the limitations and which validations needs to be ran.

Our generic data layer

Read the full article on dev.to

Adobe Experience Platform Data Collection

Generic Data Layer

What is a data layer?

Flat vs nested data layer

Array vs Object data layer root

Define a generic data layer

Naming convention

Use JSON schema definition to define your data layer

Our generic data layer

Learn

Documentation

Community

Support

Resources

Adobe account

Adobe