Data Feeds Count Page Views, Visits, and other Metrics Using R Studio

Question

Hello!

To keep it simple for now I want to compare counts of page views first.

Here is how I loaded the data. The "\" escape character seemed to be messing up the tribble so I removed it using escape_backslash = TRUE

library(tidyverse)

# read tsv delim
hit_data_df <- read_delim(
  "hit_df.tsv",
  delim = "\t",
  quote = "",
  col_names = FALSE,
  escape_backslash = TRUE,
  na = c("", "NA")
)

# read tsv headers 
headers <- read_delim(
  "column_headers.tsv",
  delim = "\t",
  quote = "",
  col_names = FALSE,
  escape_backslash = TRUE
)


# insert headers in hit_data_df
col_names <- as.character(headers[1, ])
colnames(hit_data_df) <- col_names

Based on the definition for page view count in Data Feeds calculate metrics:(https://experienceleague.adobe.com/en/docs/analytics/export/analytics-data-feed/data-feed-contents/datafeeds-calculate) 'Count the number of rows where a value is in post_pagename or post_page_url'

The line below matched the number of ocurrences 99% in AA wokspace for the same date range (1 hour)

page_name_pv <- hit_data_df %>% select(post_pagename)

Looking back at the definition 'where a value is in post_pagename' I decided to remove NA thinking it would match page view counts

page_name_pv_na_om <- hit_data_df %>% select(post_pagename) %>%
  na.omit()

...but that showed only 20% of the page views I see in AA Workspace for the same hour.

Furthermore it seems I still need to exclude_hit = 0 which will lead to even less counts?

This looks a bit counter intuitive as it kept the zeroes, what I did is to include "0" right?

hit_data_df_ih0 <- hit_df_look %>% 
  filter(exclude_hit == "0")

The below would show 'Y' in all the rows I saw, not sure if there were other values.

  hit_data_df_xh0 <- hit_df_look %>% 
  filter(exclude_hit != "0")

Would be great to know what am I doing wrong or if the data is not good (the reason I was asked to compare).

Thanks!

R

Harveer_SinghGi1 · Accepted Answer

Could it be that the data frame doesn't show the correct order once in R because 650MB is too much data for R Studio to handle? thanks again for the help!

Hi @rafael_sahagun ,

While reading delimited data whenever you see issues like unexpected values flowing into columns where preset values are expected it is almost always due to the delimiter being present in one of the column values, in your case it seems the tab delimiter (\t) is recorded in one of the values that are returned in the data feed.

I'll suggest you narrow down to row from where you start seeing the unexpected values and the row above should be the one with such values containing tab delimiter in one of values. After identifying the rows you can figure out how to drop that row and keep remaining ones.

Cheers!

Sukrity_Wadhwa · Answer

Hi @rafael_sahagun,Were you able to resolve this query with the help of the provided solutions, or do you still need further assistance? Please let us know. If any of the answers were helpful in moving you closer to a resolution, even partially, we encourage you to mark the one that helped the most as the 'Correct Reply.'Thank you!

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded