I have a question about the lookup files that get exported with the clickstream data. We export data every hour, and in each export I get a set of lookup files.
The lookup files contains two fields - an id and a value. The lookups are essentially dimension tables, the main clickstream data set is the fact table.
As I make each export available in our data warehouse I have a choice - depending on what assumptions I want to make about the lookup tables:
- Denormalize and store : apply the appropriate lookups for each hourly clickstream file and store data in denormalized form - instead of browser_id store the browser string from the lookup table. So if I am processing the 10am clickstream file I could join on the 10am lookup files and write the denormalized data to the data warehouse. I would choose this option if I cannot be sure that the lookups would never recycle a lookup ID or remove a row.
- Store as is - normalized : store the clickstream data as-is (do not join on lookups and denormalize) and just update the lookup tables each hour. Assumes that the lookup tables will only ever get new ID's added and old ID's will not be recycled or removed.
There is nothing that I can see in the documentation. This biases me towards the "Denormalize and store' strategy. However this is much more complex process to build and requires considerably more storage. My initial checks seem to confirm that I will be OK to store as-is and assume that the lookup tables will only ever get new ID's added and old ID's will not be recycled or removed.
I would love confirmation - what is the Adobe Clickstream Data Model - so that I can make the best choice?
Thanks,
Matt