Expand my Community achievements bar.

SOLVED

Adobe Analytics Workspace vs Data Feeds - Count Comparison

Avatar

Level 3

Hi All!

I was asked to compare counts between AA Workspace and Data Feeds. I tried to load the 650MB tsv file (~300K rows and ~1.5K columns) in Google Sheets and appears is too much for it to handle.

I loaded it in R Studio but I have yet to add the header.

Any advice on how to either add header and other lookups on R Studio? 

Alternatively, how do you/would you do such QA?

Thanks!
R

1 Accepted Solution

Avatar

Correct answer by
Level 9

Hi @Rafael_Sahagun ,

You can follow these steps in RStudio,

  • Read hit_date.tsv in one dataframe, let's call it hit_data_df
  • Read column_headers.tsv in one dataframe, let's call it column_headers_df
  • Update column names of hit_data_df with below command,
    • colnames(hit_data_df) <- column_headers_df[1,]

 

 

Cheers!

View solution in original post

6 Replies

Avatar

Correct answer by
Level 9

Hi @Rafael_Sahagun ,

You can follow these steps in RStudio,

  • Read hit_date.tsv in one dataframe, let's call it hit_data_df
  • Read column_headers.tsv in one dataframe, let's call it column_headers_df
  • Update column names of hit_data_df with below command,
    • colnames(hit_data_df) <- column_headers_df[1,]

 

 

Cheers!

Hi @Rafael_Sahagun ,

I forgot to add instructions for using the lookup tables. In continuation to above steps, you can follow below steps to add lookups of your choice,

  • Read lookup tsv file in one dataframe, let's take browser.tsv as an example and call resulting dataframe as browser_df
  • Use merge method to add browser_df columns into hit_data_df based on the common id (for browser lookup the ID in data feed is "browser"), run below commands to do this,
    • colnames(browser_df) <- c("browserID", "browserName") // as lookup tables don't have column names we add column names of our choice and "browserID" here will become joining key against "browser" in hit_data_df
    • hit_data_df <- merge(hit_data_df, browser_df, by.x = "browser", by.y = "browserID", all.x = true)

Cheers!

Avatar

Level 3

thanks a lot @Harveer_SinghGi1 !!!

Avatar

Level 3
 
I wasn't able to get this to work:
hit_data_df <- merge(hit_data_df, browser_df, by.x = "browser", by.y = "browserID", all.x = true)
I received "Error: object 'true' not found"

It seemed to have worked with this but wanted to show because it was trial and error, so not sure if it is correct:
 
hit_data_df <- merge(hit_data_df, browser_df, by = "browser", all.x = TRUE)
 
Thanks so much!
 
-R

Avatar

Level 9

HI @Rafael_Sahagun , 

My mistake, R doesn't recognize "true" as the Boolean TRUE, it has to be in upper case as you have used. 

Cheers!

Avatar

Level 3

Thanks so much!