Highlighted

☕[AT Community Q&A Coffee Break] 10/14/20, 8am PT: Jon Tehero, Group Product Manager for Adobe Target☕ [SERIES 2]

Amelia_Waliany

Employee

29-09-2020

Join us for our next monthly Adobe Target Community Q&A Coffee Break

taking place Wednesday, October 14th @ 8am PT

👨‍💻👩‍💻Register Now!👨‍💻👩‍💻

We'll be joined by Jon Tehero aka @Jon_Tehero, Group Product Manager for Adobe Target, who will be signed in here to the Adobe Target Community to chat directly with you on this thread about your Adobe Target questions pertaining to his areas of expertise:

  • AI improvements
  • A4T for Auto-Target
  • Slot-based Recommendations
  • General Adobe Target backend & UI

Want us to send you a calendar invitation so you don’t forget?  Register now to mark your calendar and receive Reminders!

🎯A NOTE FROM OUR NEXT COMMUNITY Q&A COFFEE BREAK EXPERT, JON TEHERO 🎯

 

 

🎯REQUIREMENTS TO PARTICIPATE 🎯

  • Must be signed in to the Community during the 1-hour period
  • Must post a Question about Adobe Target
  • THAT'S IT!  *(think of this as the Adobe Target Community equivalent of an AMA, (“Ask Me Anything”), and bring your best speed-typing game)

🎯INSTRUCTIONS 🎯

  • Click the blue “Reply” button at the bottom right corner of this post
  • Begin your Question with @Jon_Tehero 
  • When exchanging messages with Jon about your specific question, be sure to use the editor’s "QUOTE" button, which will indicate which post you're replying to, and will help contain your conversation with Jon

QUOTE BUTTON.png

 

Jon-Tehero-Bio.jpg

 

 

 

 

 

Jon Tehero is a Group Product Manager for Adobe Target. He’s overseen hundreds of new features within the Target platform and has played a key role in migrating functionality from Target's classic platforms into the new Adobe Target UI. Jon is currently focused on expanding the Target feature set to address an even broader set of use-cases. Prior to working on the Product Management team, Jon consulted for over sixty mid- to enterprise-sized customers, and was a subject matter expert within the Adobe Consulting group.

 

Curious about what an Adobe Target Community Q&A Coffee Break looks like? Check out the threads from our first Series of Adobe Target Community Q&A Coffee Breaks

A4T AI AImarketing Artificial Intelligence Auto-Target Coffee Break Product Management Q&A Recommendations

Replies

Highlighted

Shani2

30-09-2020

Hi @Jon_Tehero thank you for your time today - these coffee break sessions are great 🙂  I wanted to share our experience regarding A4T offline significance calculations. As a whole, the process is quite time consuming and most of our experiments require that we do offline calculations – it would be more practical if the Target UI/Analytics Reporting/A4T workspace panel could compute calculated metrics, but even improving the performance of Data Warehouse UI would be a great improvement to the process, since it's a requirement for those of us that select A4T as the reporting source in the activity.

With the current process, monitoring tests does not seem really practical b/c of the manual processes involved, but it is truly a requirement for high-risk tests and there is no way around it, we can't wait for the test to reach the required sample size/# of conversions to end the test and complete the significance analysis.

Currently we have to take these steps to perform offline significance calculations for A4T activities due to analytics continuous variables: 1) we have to create multiple segments that are compatible with data warehouse, 2) pull various reports from data warehouse to break down the data in a digestible format, 3) after the report is available sometimes hours or a day or 2 later, we then have to enter formulas in excel to calculate visitors and compute sum of success metric squared 4) followed by inputting the data into the excel spreadsheet confidence calculator.

In general, the process makes monitoring tests difficult, very time consuming, and I would go as far and say it may even discourage the monitoring cycle of the testing process because it requires a lot of effort. The level of effort required isn't ideal either after a test ends, but re-doing these steps in week intervals for example when a test is running for a test that could be identified as high-risk for the business isn't practical. 

Is there an improvement to this process in the road-map or any recommendation on how to create efficiencies with existing functionality? There isn't much information I could find in the Adobe Cloud documentation that provided alternative solutions, but was hoping you could provide more insight to future improvements or potentially other ways that we can achieve the same result with less effort?

Thank you!

Highlighted

Shani2

13-10-2020

Hi again, @Jon_TeheroI have a statistical question. Is there a feature in the roadmap for the user to be able to select a one tailed test as the statistical method in the Target UI during the experience setup? Currently, by default, the Target statistical engine is configured to support a two tailed test as well as the Sample Size calculator. Technically we can manually adjust the significance level in the sample size calculator if we wished to convert to a single tailed test and manually choose to end a test based on a set of given parameters, however, it would be a lot less effort if that could be automated by being able to select one-sided vs two-sided tests in the UI. One-tailed tests require less time to run and has a lower error probability than two-tailed tests (i.e. alpha is not halved) if testing for a specific direction (i.e. positive/negative). If for example. we are truly testing for superiority, then testing for a negative impact has no value and the cost is having a test run for longer than needed for the business question of interest. As we look for more ways to be agile and become more efficient with our testing program, I am eager to learn if expanding alternative ways of computing statistical significance is in Target’s product roadmap? Thank you!

Highlighted

Shani2

13-10-2020

@Jon_Tehero  Sorry, I am full of questions, you can tell I couldn’t wait for this event 😊 this should be my last one and it’s regarding the AB AA algorithm. The concept/mechanism of AA is a great one, however, I think there are some limitations in a particular use case. I won’t go into the benefits of AA as those are plentiful, but do want to highlight one specifically, i.e. optimization occurs in parallel with learning.


Moreover, in the event there are only 2 experiences, I think there is a true risk with false positives (higher than 5%) with the current algorithm logic, i.e.  after the better performing experience reaches 95% confidence, 100% of traffic is allocated to the experience identified as the winner. Unlike the logic for 3 or more experiences in which 80% of traffic is allocated to the winner and 20% of traffic continues to be served randomly to all experiences – this is key in the event there are user behavior shifts and confidence intervals begin to overlap with other experiences while the test is running.


I’ve encountered a few experiences using Target’s manual A/B test in which the stats engine has called a winner early and a badge was displayed in the activity, however, after hours/days/weeks of collecting more data, the engine removes the badge as it recognizes that confidence levels are still overlapping/fluctuating. This is a prime example of how important it is to determine sample size/tests parameters before running a test to prevent ending a test prematurely and to ensure statistically valid results, but also why I raise my concern with the AA logic specifically for the 2 experiences scenario. Currently, there is no room for the algorithm to correct itself in the event it identified an experience as a winner that truly was not because there isn’t a reserve of traffic allocated for learning if user behavior changes – this is not truly a multi-armed bandit approach in this use case because after 95% confidence is reached optimization no longer occurs in parallel with learning.


Furthermore, another concern on the logic of the algorithm for two experiences is that hypothetically we cannot detect a novelty effect because the algorithm may declare an experience a winner too early. We have observed novelty effects after adding a new feature that is attention grabbing in manual A/B tests, for the first two weeks a challenger may be performing better than the default experience and display a badge, but with time the positive effect wears out as more data is collected – confirming that the lift was only an illusion.


In sum, I hesitate using AA for 2 experiences due to the current AI logic. But the dilemma is that we don’t tend to test in our organization more than 2 experiences. Are there any suggestions on how we can mitigate false positives for 2 experiences for AA? Is enhancing the algorithm for two experiences in the roadmap so that it serves as a true multi-armed bandit approach to optimization? Lastly, in the product roadmap, will users have the ability to set the significance level for AI driven activities? Not all tests are created equal, therefore, they will not have the same risks/costs, thus, some tests may require a false positive-rate less or more than 5%.

Please note I am aware of the time-correlated caveat for AA and the experiences I discussed above re Manual A/B tests were not contextually varying.

Thank you!

Highlighted

Jon_Tehero

Employee

14-10-2020

Hello everyone! I am looking forward to chatting with you in a few minutes and answering your questions.

Highlighted

frihed30

14-10-2020

 


@Jon_Tehero 
What is the best way to combine testing methods (a/b testing while personalising - XT & AB). To make sure that you are always improving? 

Highlighted

peterhartung

Employee

14-10-2020

Hi Everyone! I'm looking forward to hearing from Jon today.  He always impresses with his Target and Recommendations depth and breadth of knowledge.

Highlighted

mravlich

14-10-2020

@Jon_Tehero  Hi -> with the new A4T view in workspace, will we ever use calculated metrics within the view to get confidence levels? Any tips for using this now? Thanks.

Highlighted

Jon_Tehero

Employee

14-10-2020


@Shani2 wrote:

Hi @Jon_Tehero thank you for your time today - these coffee break sessions are great 🙂  I wanted to share our experience regarding A4T offline significance calculations. As a whole, the process is quite time consuming and most of our experiments require that we do offline calculations – it would be more practical if the Target UI/Analytics Reporting/A4T workspace panel could compute calculated metrics, but even improving the performance of Data Warehouse UI would be a great improvement to the process, since it's a requirement for those of us that select A4T as the reporting source in the activity.

With the current process, monitoring tests does not seem really practical b/c of the manual processes involved, but it is truly a requirement for high-risk tests and there is no way around it, we can't wait for the test to reach the required sample size/# of conversions to end the test and complete the significance analysis.

Currently we have to take these steps to perform offline significance calculations for A4T activities due to analytics continuous variables: 1) we have to create multiple segments that are compatible with data warehouse, 2) pull various reports from data warehouse to break down the data in a digestible format, 3) after the report is available sometimes hours or a day or 2 later, we then have to enter formulas in excel to calculate visitors and compute sum of success metric squared 4) followed by inputting the data into the excel spreadsheet confidence calculator.

In general, the process makes monitoring tests difficult, very time consuming, and I would go as far and say it may even discourage the monitoring cycle of the testing process because it requires a lot of effort. The level of effort required isn't ideal either after a test ends, but re-doing these steps in week intervals for example when a test is running for a test that could be identified as high-risk for the business isn't practical. 

Is there an improvement to this process in the road-map or any recommendation on how to create efficiencies with existing functionality? There isn't much information I could find in the Adobe Cloud documentation that provided alternative solutions, but was hoping you could provide more insight to future improvements or potentially other ways that we can achieve the same result with less effort?

Thank you!


Hi Shani2,

 

Thank you for your question! We've received a lot of request for supporting calculated metrics. We know that this would improve the overall process/workflow for our customers. My peers on the Analytics product management team have this feature in their backlog but we do not have any specific dates at this time. 

 

If you have access to the Experience Platform and have your analytics data landing on platform, the query service is probably the best option for achieving this today.

Highlighted

drewb6915421

14-10-2020

@mravlich  Analytics team is working on enabling support for calculated metrics, but the complexity arises with how Analytics collects data based on visitors.  Good suggestions on best practices for using A4T and success metrics on this Spark page: https://spark.adobe.com/page/Lo3Spm4oBOvwF/