Hi Mike,
Thanks for the response. I was going through the below content :
1. Methods https://docs.adobe.com/content/help/en/target/using/implement-target/before-implement/methods/method...
2. Data collection for Target's personalization algorithms
3. Personalization Insights reports overview
Additionally, I would like to give you a quick context on what are reward probabilities.
For MAB algorithms, suppose I have 2 variants, A(control) and B(variation).
Based on visitor interactions, and depending on the CTRs (Clickthrough rates) on the variants, we can derive reward probabilities of A and B. Let's say in a single day 1000 visitors are interacting with A and B. Out of those 1000 visitors, A gets 50% of the traffic and B gets 50% of the traffic. So, out of 500 hits on A, only 150 convert on it. And out of the other 500 hits on B, 300 convert on it. A conversion metric equates to being generating a reward (a boolean 0 or 1). So in this case, reward probability of A is 0.3 (150/500) and that of B is 0.6 (300/500). Ofcourse, this will change as more visitors interact in a typical A/B test activity. These reward metrics ideally serves to be the input data to the training models of the algorithms. This example is extremely simple but in real time there might be a lot more complexity involved into deciding what is the reward probability of the experiences controlled by numerous factors.
Hope, this gives you an insight into the reward probabilities.
Please let me know for any other questions.
Thanks.