Application Context:
We have an AI Chat interface that allows users to ask questions and receive AI-generated responses. The AI model is tuned using our proprietary content and industry trends.
Current Analytics Implementation:
We currently track user attributes and the conversation ID (associated with each chat session) as dimensions, and we capture events such as message sent, message received, etc., as metrics.
The conversation ID is stored in an eVar that persists at the hit level.
Reporting Today:
Our existing reporting focuses on quantitative metrics such as users, visits, number of messages sent, etc., grouped by user attributes.
New Requirement:
The business now wants to report on qualitative attributes of each conversation. These attributes are calculated by our AI team in the ETL pipeline and include items such as:
Conversation intent
Conversation topic
Number of meaningful keywords
Response accuracy
Other qualitative indicators of engagement
Potential Approaches:
Send attributes in real time from the AI API to the frontend:
The AI team would append these qualitative values to the API response so the frontend can send them to Adobe Analytics in real time.
Enrich the existing conversation ID after the fact:
a. Using Classifications (SAINT):
We could classify the conversation ID with the additional attributes. However, I am unsure about classification volume limits.
Today, we process fewer than 300K conversation IDs per month, but this number is expected to grow. I’m also unclear about how classification will behave once the conversation ID dimension exceeds 2M+ unique values per month—specifically, how values appear in reporting when Adobe begins grouping high-volume unique IDs into the Low Traffic bucket.
b. Using Transaction ID with a Data Source:
We could send the qualitative attributes offline, using the conversation ID as the Transaction ID. However, Adobe documentation states that the Transaction ID must be unique per hit. In our case, a conversation will have multiple hits (message sent, message received, etc.), so I’m unsure whether this approach is valid.
Request for Guidance:
Given the above, I’m looking for suggestions on the ideal approach: #1, #2a, or #2b.
Option #1 is the most straightforward from an analytics perspective but requires additional frontend and API development, which we would prefer to minimize if possible.
Thanks,
Nitesh
Solved! Go to Solution.
Views
Replies
Total Likes
Hi @nitesh__anwani,
This is an interesting need... I have a question that might add additional thoughts / complexity...
Are you using Raw Data Feeds, and if so, how often are the files coming?
Classifications aren't available in Data Feeds, and Data Source Stitching is done after the fact, it may be that the information hasn't been stitched yet and missed in the exports....
Option #1, tracking in real time means the information is available in all sources...
I haven't done Data Sources, but I know that can be difficult to stitch properly, and will need a lot of testing to be sure that everything is properly connected,
Classifications, is a decent option, if you don't need raw data feeds, I don't think there would be any limitations here... you can create the appropriate classifications and map the data in (so long as you can reliably get your data source into Adobe in a timely manner).
If I was doing this, I would push for Option 1, but we have hourly Raw Data Feeds... and that information would definitely be better to have in the Data Lake (and while yes, the data team could map it in on their side, why do double mapping, that may not 100% match in the various sources....)
Hi @nitesh__anwani,
This is an interesting need... I have a question that might add additional thoughts / complexity...
Are you using Raw Data Feeds, and if so, how often are the files coming?
Classifications aren't available in Data Feeds, and Data Source Stitching is done after the fact, it may be that the information hasn't been stitched yet and missed in the exports....
Option #1, tracking in real time means the information is available in all sources...
I haven't done Data Sources, but I know that can be difficult to stitch properly, and will need a lot of testing to be sure that everything is properly connected,
Classifications, is a decent option, if you don't need raw data feeds, I don't think there would be any limitations here... you can create the appropriate classifications and map the data in (so long as you can reliably get your data source into Adobe in a timely manner).
If I was doing this, I would push for Option 1, but we have hourly Raw Data Feeds... and that information would definitely be better to have in the Data Lake (and while yes, the data team could map it in on their side, why do double mapping, that may not 100% match in the various sources....)
Hi @Jennifer_Dungan ,
Thanks for your response!
We use Adobe’s raw data feed exports daily in our data warehouse. Since these supplemental data points are pushed from our system into Adobe Analytics, we already have the corresponding qualitative attributes stored in our warehouse. We can combine these with Adobe’s data feeds as needed. Although the data resides in two different tables, we can still map it using the conversation ID that is present both in the Adobe data feeds and in our AI conversation logs. Therefore, we do not rely on Adobe’s data feeds to provide these qualitative attributes.
I have designed the solution as follows:
Capture real-time attributes from our backend system and track them in Adobe directly from the frontend, consistent with our existing implementation. This also helps avoid any “Low Traffic” buckets for qualitative attribute reporting.
Use classifications for attributes that cannot be captured in real time. Although any dimension value that falls under the “Low Traffic” bucket will also display “Low Traffic” for its classified value, this should not pose an issue. The “Low Traffic” grouping appears only after exceeding two million unique values per month, and we are currently well below that threshold.
I appreciate your response and the way you precisely highlighted the data feed considerations. I understood that each of these approaches has its own pros and cons, but I still wanted to hear different perspectives.
Thanks,
Nitesh
Glad to help. Every implementation is going to be slightly different, weighing the pros and cons; and the "best" solution is going to vary depending on many factors.
It's good you already have a mapping strategy.. for most, that would be a new undertaking, and the design and testing phase would be a significant undertaking, potentially even slowing down the project.
Views
Replies
Total Likes