Your achievements

Level 1

0% to

Level 2

Tip /
Sign in

Sign in to Community

to gain points, level up, and earn exciting badges like the new
Bedrock Mission!

Learn more

View all

Sign in to view all badges

Audience Lab - Data discrepancy in the split files at destination


Employee Advisor

Use Case: Audience Lab test group is created with one base segment "Segment-A" which is divided into two splits as "Target-90%" and "Control-10%".

Total segment size = 100

Based on the split -

Target Destination should get 90


Control Destination should get 10


actual result is:

Target Destination = 82

Control Destination = 18

Question: Why we are seeing such a discrepancy?


Following points will enlist on how Audience Lab split the numbers in the outbounded files:

- The splitting is done by computing a hash for the id (there's a precedence rule) of the user.

- Then the hash function is used to obtain the percent bucket in which the user will be split.

The hash function provides a good spread of the users, but for small numbers it cannot guarantee an exact split. The tests which have been done in development environment have shown a difference of +-2% when there were 1000 user in 2 equal buckets (50-50). Hence, the things will go worse when there's an order of magnitude between the buckets, and when the number of users are so low.

To conclude, the split will not be a 100% match with the input numbers and there will be always an error factor with the exported numbers.

0 Replies