I am working to bring in our Clickstream data into a local database for deeper analysis correlated with internal data. Is there a recommended combination of columns to be used as a primary key? I assume it would be post_visid_high, post_visid_low and one of the time based columns like post_t_time_info. Or, would date_time be better than post_t_time in this case? I can't tell what the difference is other than "post_" suggests it would be subject to post processing rules.
Well, my goal is to find a primary key for the hit data. I need to be able to identify each row uniquely. Just a user id won't be enough. Matching to do a join to my other datasets won't be a problem. But, I need a PK in the table for optimization.