split column (dates) based on another column(emails)

Question

Trying to create a new data_frame based on the order date column and emails. So if I have a duplicated email (e.g. [email protected] in the example below), I want to merge the emails and place the order_date variable in a new column next to it. I want to do this in the full DF. This will introduce many NAs but i will solve that problem later.

I have a dataframe as follows:

Source: local data frame [6 x 4]
Groups: email [5]

                email order_date `sum(price_excl_vat_euro)` `sum(total_qty)`
                <chr>     <date>                      <dbl>            <int>
1 [email protected] 2016-09-05                     140.48                2
2 [email protected] 2016-11-01                     41.31                1
3 [email protected] 2016-09-18                     61.98                1
4 [email protected] 2016-08-01                      61.98                1
5 [email protected] 2016-08-02                      61.98                1
6 [email protected] 2016-08-02                      140.49                1

What i want to obtain is (the other columns i do not care about for now):

email            order_date1    order_date2
[email protected]   2016-09-05     NA
[email protected]  2016-11-01     NA
[email protected]  2016-09-18     NA
[email protected]   2016-08-01     2016-08-02
[email protected]    2016-08-02     NA

It is important to know that the number of orders could vary between 1-10 (average). I tried the spread function from the tidyr package. But couldn't get it to work. Any hints are very appreciated!

score 3 · Answer 1 · answered Nov 15 '16 at 16:27

For example

df <- read.table(row.names=1, stringsAsFactors = F, text="
1 [email protected] 2016-09-05                     140.48                2
2 [email protected] 2016-11-01                     41.31                1
3 [email protected] 2016-09-18                     61.98                1
4 [email protected] 2016-08-01                      61.98                1
5 [email protected] 2016-08-02                      61.98                1
6 [email protected] 2016-08-02                      140.49                1")
df <- df[order(df[,1], df[,2]), ]
lst <- split(df[,2],df[,1])
do.call(rbind, lapply(lst, "length<-", max(lengths(lst))))
#                 [,1]         [,2]        
# [email protected]  "2016-08-01" "2016-08-02"
# [email protected]   "2016-08-02" NA          
# [email protected] "2016-11-01" NA          
# [email protected]  "2016-09-05" NA          
# [email protected] "2016-09-18" NA

or

library(tidyverse)
df %>% 
  arrange(V2, V3) %>% 
  group_by(V2) %>% 
  transmute(V3, date=paste0("date", 1:n())) %>% 
  spread(date, V3)
# Source: local data frame [5 x 3]
# Groups: V2 [5]
# 
#                V2      date1      date2
# *           <chr>      <chr>      <chr>
# 1  [email protected] 2016-08-01 2016-08-02
# 2   [email protected] 2016-08-02       <NA>
# 3 [email protected] 2016-11-01       <NA>
# 4  [email protected] 2016-09-05       <NA>
# 5 [email protected] 2016-09-18       <NA>

Thanks this works indeed. Can you maybe also tell me how i can keep the other columns? Guess adding them to arrange should do the job. — Sander Van der Zeeuw, Nov 16 '16 at 09:10
Sum them up and merge them afterwards ( for example). Or: Post a new question and specify the exact desired output (which I don't know) — lukeA, Nov 16 '16 at 09:54
Well after splitting them i have a order lets say 2 times. So the email from that user appears as 2 rows, where order_date is filled in first row date1 and second row date2. If i have multiple orders it continues 3rd row date3. Basically i want to merge these columns into 1. based on transaction_ids — Sander Van der Zeeuw, Nov 16 '16 at 10:03

split column (dates) based on another column(emails)

1 Answers1