Expand my Community achievements bar.

Join us for the next Community Q&A Coffee Break on Tuesday April 23, 2024 with Eric Matisoff, Principal Evangelist, Analytics & Data Science, who will join us to discuss all the big news and announcements from Summit 2024!
SOLVED

Does not exist vs Exists (exclude)

Avatar

Level 4

I have a question about the difference between a segment using does not exist and exists but then excluding it.

I am trying to get data for a conversion rate (new upload/unique visitors) and to see how what it is when I exclude a specific event.

So, i've created the event in two ways:

1. Visitor level - eventX does not exist

2. Visitor level - eventX exists (exclude)

They give me very different results (does not exist segment shows conversion of 1.91% and exclude segment shows 1.15%)  and I'm trying to figure out why.

I know these two segments should act differently on a visitor level segment but I can't wrap my head around the logic of why they act differently.

Any explanation would be greatly appreciated.

Thanks,


Frank

1 Accepted Solution

Avatar

Correct answer by
Level 4

Hello Frank,

You are correct that the two examples leave you with VERY different data.  I'll try to break it down working from the inter-most aspect of the segment and working outward.  The main aspect of this comes up all the time when I review other's reporting or during training sessions, I'd even say it's the most common misconception about how segments work.

1. Visitor level - eventX does not exist

Working from the inside out, we take a look at the criteria "eventX does not exist" .  Once we find a record where eventX does not exist, then we look at the level of the container (this segment is at Visitor Level).  Visitor level container means we want to keep all the data that Visitor has ever done (which Includes the records where eventX actually exists.

Let's say that eventX happens somewhere deep in your site.  The first record of data is likely to be a pageview of a homepage - which doesn't have eventX, right?  Well, your segment found the pageview record for the homepage (eventX does not exist), and then kept all of the data anyway!  So the segment ends up not really filtering out any data.

2. Visitor level - eventX exists (exclude)

This one is much more explicit in finding eventX records, so it works well.  Working again from the inside-out, we start with the criteria: "eventX exists."  Once we find a record that has eventX, we can then exclude all of the records from that visitor (hence the Visitor Level container again)

So in this case, if VisitorABC ever had a record with eventX within your timeframe, then ALL of his/her data would be thrown out and not counted in the report.

So, overall #1 is a very common mistake that people make in reporting, but hopefully this info clears up the difference a bit.

Another good example is the following:

- Visitor Level - PageName does not equal 'XYZ'

This segment would be true for just about every since visitor of your website (they see the homepage, or a different page, or a click event record, etc).  Thus, the segment has almost no effect.

View solution in original post

4 Replies

Avatar

Correct answer by
Level 4

Hello Frank,

You are correct that the two examples leave you with VERY different data.  I'll try to break it down working from the inter-most aspect of the segment and working outward.  The main aspect of this comes up all the time when I review other's reporting or during training sessions, I'd even say it's the most common misconception about how segments work.

1. Visitor level - eventX does not exist

Working from the inside out, we take a look at the criteria "eventX does not exist" .  Once we find a record where eventX does not exist, then we look at the level of the container (this segment is at Visitor Level).  Visitor level container means we want to keep all the data that Visitor has ever done (which Includes the records where eventX actually exists.

Let's say that eventX happens somewhere deep in your site.  The first record of data is likely to be a pageview of a homepage - which doesn't have eventX, right?  Well, your segment found the pageview record for the homepage (eventX does not exist), and then kept all of the data anyway!  So the segment ends up not really filtering out any data.

2. Visitor level - eventX exists (exclude)

This one is much more explicit in finding eventX records, so it works well.  Working again from the inside-out, we start with the criteria: "eventX exists."  Once we find a record that has eventX, we can then exclude all of the records from that visitor (hence the Visitor Level container again)

So in this case, if VisitorABC ever had a record with eventX within your timeframe, then ALL of his/her data would be thrown out and not counted in the report.

So, overall #1 is a very common mistake that people make in reporting, but hopefully this info clears up the difference a bit.

Another good example is the following:

- Visitor Level - PageName does not equal 'XYZ'

This segment would be true for just about every since visitor of your website (they see the homepage, or a different page, or a click event record, etc).  Thus, the segment has almost no effect.

Avatar

Employee Advisor

Exactly what gflare​ has said.

You can also looks at the answer here that I had provided for a VISIT container:

"Exclude" and "Does not contain" in segment

Avatar

Level 4

Thanks very much for your reply.

Just curious, in what situation would you use 'does not exist' instead of exists (exclude)?

Thanks.

Avatar

Level 4

There really isn't too much difference in most applications between doing a does not exist (include), and an exists (exclude) condition.  But when you start nesting conditions in your segmentation it's good to have both options (not to mention some people can conceptualize one function better than the other).

'Does not exist' is a great condition with many uses.  The reason it tends to get people into trouble is due to the container's level being visitor or visit, causing the user to include records that they were hoping to not include.

I prefer to teach using includes most of the time, so here is an example scenario who's thought-process is "exclude" , but actually accomplished with includes...

It's not exactly your question, but hey - it might help someone else reading this

Let's say you want to have a dataset that does not have any pageview (s.t calls) records in it.  You want all the records that are not pageviews, but your structure for s.tl calls is not consistent (let's assume you have some s.tl calls that are firing for errors, downloads, custom click events, custom exit clicks, overlayviews, or form submissions).

You can either try to identify all the other information you capture in the s.tl calls to include that, or you can just identify all of them with "PageView does not exists" or "s.pagename does not exist" conditions (any record that is not an s.t call is inherently an s.tl call (aside from other data sources- but we'll ignore that tangent)).

But when you make this segment you have to think about the container level.  If you were to use an overall container level of 'visit or visitor' , then your segment would find an s.tl record, and then include everything at that level - which brings the pageviews right back in again.  If you use this condition at the 'hit' level, then you will have a resulting dataset without pageviews, as intended.

It's a good practice to think about an explicit condition first "variable equals value" , before trying the opposite "variable does not equal value".  If there are a lot of possible equal conditions, then you'll be better off going the not equal route within a container that's not going to bring that data back in.

However, don't write it off completely, because there are great ways of combining 'does not exist' or 'does not equal' with other conditions within a 'hit level container' to ensure the conditions are met within a single-record scope.  That's where the capabilities of segmentation really start to skyrocket, as long as you're extra careful in the understanding of the results.