Automatically match and replace word & position with its replacement

I am working with Google Analytics conversion path data in R. My imported data frame looks like the following example:

    Channel_Path                                | Source_Path Social > Email > Social > Paid Search > Social  | facebook > mailtool > m.facebook.com > google > facebook+instagram Organic Search > Email > Social                 | google > mailtool > pinterest 

As you can see, the different channels are separated by a ">" symbol. What I would like to do is the following:

Replace "Social" in the "Channel_Path" column by the corresponding value in the "Source_Path" column without changing any other values. This should happen to all rows in the dataset.

The result should then look like the following:

      Channel_Path                                                   | Source_Path facebook > Email > m.facebook.com > Paid Search > facebook+instagram | facebook > mailtool > m.facebook.com > google > facebook+instagram Organic Search > Email > pinterest                                   | google > mailtool > pinterest 

The issue I have here is that I am working with a large dataset (60.000 rows) and that I don’t know how to automatically replace the values depending on their position.

For a better reproducability, here is the code for the example given above:

df <- data.frame(Channel_Path = c("Social > Email > Social > Paid Search > Social", "Organic Search > Email > Social"),              Source_Path = c("facebook > mailtool > m.facebook.com > google > facebook+instagram", "google > mailtool > pinterest")) 

Thank you!

Asked on August 30, 2020 in R.
Add Comment
2 Answer(s)

We can get the data in long format separating the columns on " > ", replace Channel_Path values where Channel_Path == 'Social' and paste the values again.

library(dplyr)  df %>%   mutate(row = row_number()) %>%   tidyr::separate_rows(Channel_Path, Source_Path, sep = " > ") %>%   mutate(Channel_Path = ifelse(Channel_Path == 'Social',                                 Source_Path, Channel_Path)) %>%   group_by(row) %>%   summarise(across(.fns = ~paste(., collapse = " > "))) %>%   select(-row)   #                                                          Channel_Path #1 facebook > Email > m.facebook.com > Paid Search > facebook+instagram #2                                   Organic Search > Email > pinterest #                                                         Source_Path #1 facebook > mailtool > m.facebook.com > google > facebook+instagram #2                                      google > mailtool > pinterest 
Answered on August 30, 2020.
Add Comment

input:

df <- data.frame(Channel_Path = c("Social > Email > Social > Paid Search > Social", "Organic Search > Email > Social"),          Source_Path = c("facebook > mailtool > m.facebook.com > google > facebook+instagram", "google > mailtool > pinterest")) 

function:

library(tidyr) library(dplyr) library(stringr)  google_analytics <- function(col1,col2){ str1 <- str_split(col1," > ")[[1]] str2 <- str_split(col2," > ")[[1]] result <- "" for(i in 1:length(str1)){   if(str1[i]=="Social"){     str1[i] <- ifelse(str2[i] %in% c("facebook+instagram","m.facebook.com"),"facebook",str2[i])   }   if(i==length(str1)){     result <- paste0(result, str1[i])     next   }   result <- paste0(result, str1[i], " > ") }  return(result) }  df <- df %>% rowwise() %>% dplyr::mutate(Channel_Path=google_analytics(Channel_Path,Source_Path)) 

output:

Channel_Path                                   Source_Path                                                <chr>                                          <chr>                                                    1 facebook > Email > facebook > Paid Search > f~ facebook > mailtool > m.facebook.com > google > faceboo~ 2 Organic Search > Email > pinterest             google > mailtool > pinterest 
Answered on August 30, 2020.
Add Comment

Your Answer

By posting your answer, you agree to the privacy policy and terms of service.