Hello.
I'm working on using the AEP Query Service and am getting several errors. Ultimately what I'm trying to do is take our website clickstream data and aggregating to a visitor + session start time level and then push that data into a new AEP dataset so I can use CJA to view the information
Here is my query, that when I run it in the Query Service as a template it seems to work fine
WITH TimeGaps AS ( SELECT identityMap, _id, TO_TIMESTAMP(timestamp, 'MM/dd/yyyy, h:mm a') AS event_time, LAG(TO_TIMESTAMP(timestamp, 'MM/dd/yyyy, h:mm a')) OVER (PARTITION BY CAST(identityMap AS STRING) ORDER BY TO_TIMESTAMP(timestamp, 'MM/dd/yyyy, h:mm a')) AS prev_event_time FROM blue_acorn_ici_website WHERE CAST(identityMap AS STRING) LIKE '%60236472619724576034132552261527929983%' ),
TempTimeGaps AS ( SELECT *, CASE WHEN (event_time - prev_event_time) IS NULL OR (event_time - prev_event_time) > INTERVAL '1800' SECOND THEN 1 ELSE 0 END AS reset_counter, SUM(CASE WHEN (event_time - prev_event_time) IS NULL OR (event_time - prev_event_time) > INTERVAL '1800' SECOND THEN 1 ELSE 0 END) OVER (PARTITION BY CAST(identityMap AS STRING) ORDER BY event_time) AS session_number, ROW_NUMBER() OVER (PARTITION BY CAST(identityMap AS STRING), SUM(CASE WHEN (event_time - prev_event_time) IS NULL OR (event_time - prev_event_time) > INTERVAL '1800' SECOND THEN 1 ELSE 0 END) OVER (PARTITION BY CAST(identityMap AS STRING) ORDER BY event_time) ORDER BY event_time) AS event_hit_depth FROM TimeGaps ORDER BY CAST(identityMap AS STRING) ASC, event_time ASC ),
AggregatedTimeGaps AS ( SELECT SUBSTRING(CAST(identityMap AS STRING),POSITION('ous,', CAST(identityMap AS STRING))+ 5,POSITION(', true', CAST(identityMap AS STRING)) - POSITION('ous,', CAST(identityMap AS STRING)) - 5) AS webSession_id, session_number AS webSession_number, COUNT(_id) AS webSession_record_count, MIN(event_time) AS webSession_start_time, MAX(event_time) AS webSession_end_time FROM TempTimeGaps GROUP BY CAST(identityMap AS STRING), session_number )
SELECT * FROM AggregatedTimeGaps ORDER BY CAST(webSession_id AS STRING) ASC, webSession_number ASC;
The results I were able to copy and paste from manually running the query are attached (excel) and they include everything that I'm looking for.
Using the 'schedule' option for my query template:
I first tried to push the results into an existing data set, but got an error message saying the 'dataset already exists'. Which is strange because I already knew it existed and I was trying to push the data into it.
Also attached is the schema for the existing dataset that I was trying to use. I'm pretty confident that all the column names are the same, so I don't believe that to be the issue.
But then I tried to create a new dataset (again using the scheduled query template option) and I got the attached 58000 error codes. I used the 'create new dataset' option and these error message are telling me to remove fields from my query. Doesn't make a lot of sense to me.
Again, my goal is to get the results of this query into a dataset that I can use in CJA. I want to be able to run this every day and to ideally to replace the data that is already in the dataset (keeping the dataset name the same every day), but if appending the data is the only option, than I can work with that as well.
any suggestions or thoughts as to why
a) when using the existing data set option I'm getting an error saying the dataset already exists?
b) while using the create a new data set option, i'm getting the 58000 error codes?
Thanks so much for your help and guidance on how to fix this.
Solved! Go to Solution.
Topics help categorize Community content and increase your ability to discover relevant content.
a) When Select query is used for schedule, it always tries to do CTAS, where a dataset is expected to be created and populated with the data.
Instead 'insert into "existing dataset" select **' , should solve this.
b) Regarding the 58000 error codes, this is ocurring, since the schema structure is different.
Try wrapping the fields using a struct() and name the encapsulated object as the _tenant name.(_ic***) and then try ingesting, this would solve this.
Hope this helps.
a) When Select query is used for schedule, it always tries to do CTAS, where a dataset is expected to be created and populated with the data.
Instead 'insert into "existing dataset" select **' , should solve this.
b) Regarding the 58000 error codes, this is ocurring, since the schema structure is different.
Try wrapping the fields using a struct() and name the encapsulated object as the _tenant name.(_ic***) and then try ingesting, this would solve this.
Hope this helps.
@jlinhardt_blueAcorn Did you find the suggested solutions helpful? It would be great if you can mark the answer as correct for posterity. If you have found out solution yourself, share it with wider audience in the community.
Views
Replies
Total Likes
Yes, your comment about 'STRUCT' was a bit vague and didn't provide much detail, but I was able to figure it out.
In short, for the rest of the community, when creating a dataset from an SQL query, you need to use the STRUCT approach to account for any of your 'custom' schema nesting and custom fields.
Meaning, in your schema, if you add a custom field, the newly created field will be added to your schema under a nested folder, that nested folder will have your organization's name it.
For example, my schema was using the AEP WEB SDK Experience Event Field Group. That field group included the nested structure of:
Website
-- Device
---- deviceType
-- Environment
-- Implementation Details
-- Web
-- _Experience
_id
eventType
timestamp
identityMap
But then I added some custom fields to the schema and those custom fields showed up in a new folder with my org id. For example, my schema now looked like:
Website
-- Device
----- deviceType
-- Environment
-- Implementation Details
-- Web
-- _Experience
--_org id
------- custom field name
_id
eventType
timestamp
identityMap
So when I ran an SQL Query for the purpose of making another dataset, I had to group my 'custom field names' within the STRUCT statement. For example:
Select
_id,
eventType,
timestamp,
identityMap,
device.deviceType,
STRUCT(
orgid.customFieldName
) AS _orgId
From dataset
In the above SQL, the first few fields were referenced directly because they were, per the schema, part of the root level. The Device Type reference had the 'Device' nesting folder included in the device.deviceType syntax to include the sub root folder of 'Device'. However device.deviceType did not require the STRUCT approach because it was pre-defined by the Schema Field Group (ie, not added manually to the schema). Since the customFieldName was added manually to the schema, it was added under a sub root folder with my org id in it. As such my SQL statement had to include the STRUCT statement as well as the orgid.customFieldName field name reference. Also note that the STRUCT statement ended with me adding "AS _orgId". This must match the orgId folder that is in your schema.
In other words, the STRUCT statement let the AEP Query Service find a custom sub root folder within the schema, called "_orgid"
the end result of the above was a SQL query results that looks something like
_id eventType timeStamp identifyMap device _orgId
{id} {pageView} {01:00:00} {ECID=1350} {desktop} {custom field value}
When creating datasets from SQL statements, the fields need to be grouped to mirror the nesting that exists within your Schema.
I hope that helps explain why we you need to use STRUCT within the AEP Query Service, again when creating a new dataset. STRUCT is not necessary if you're just running an SQL to pull actual results (i.e., not creating a new dataset from your SQL query).
Views
Likes
Replies