Audience Lab - Data discrepancy in the split files at destination




Use Case: Audience Lab test group is created with one base segment "Segment-A" which is divided into two splits as "Target-90%" and "Control-10%".

Total segment size = 100

Based on the split -

Target Destination should get 90


Control Destination should get 10


actual result is:

Target Destination = 82

Control Destination = 18

Question: Why we are seeing such a discrepancy?


Following points will enlist on how Audience Lab split the numbers in the outbounded files:

- The splitting is done by computing a hash for the id (there's a precedence rule) of the user.

- Then the hash function is used to obtain the percent bucket in which the user will be split.

The hash function provides a good spread of the users, but for small numbers it cannot guarantee an exact split. The tests which have been done in development environment have shown a difference of +-2% when there were 1000 user in 2 equal buckets (50-50). Hence, the things will go worse when there's an order of magnitude between the buckets, and when the number of users are so low.

To conclude, the split will not be a 100% match with the input numbers and there will be always an error factor with the exported numbers.