3

I would like to create a data frame in which in the first column I will have all the dates from a certain period of time and in the second the number of events that occurred on each date including dates when no events occurred. I would also like to count the events to which specific factors have been assigned

The first data frame in which I have the events with dates for a given date:

Row Sex Age      Date
1    2   36   2004-01-05
2    1   47   2004-01-06
3    1   26   2004-01-10
4    2   23   2004-01-20
5    1   50   2004-01-27
6    2   35   2004-01-28
7    1   35   2004-01-30
8    1   38   2004-02-06
9    2   29   2004-02-11

Where in the column "Sex" 1 means female and 2 male.

Second data frame in which I have dates from the examined period:

Row    Date
 1  2004-01-05
 2  2004-01-06
 3  2004-01-07
 4  2004-01-08
 5  2004-01-09
 6  2004-01-10
 7  2004-01-11
 8  2004-01-12
 9  2004-01-13
10  2004-01-14

I want to get a data frame that looks like this:

Row    Date       Events (All)   Events (Female)   Events (Male)
 1  2004-01-05         1                0                1
 2  2004-01-06         1                1                0
 3  2004-01-07         0                0                0
 4  2004-01-08         0                0                0
 5  2004-01-09         0                0                0
 6  2004-01-10         0                1                0
 7  2004-01-11         0                0                0
 8  2004-01-12         0                0                0
 9  2004-01-13         0                0                0
10  2004-01-14         0                0                0

Could anyone help?

1 Answers1

0

I made the assumption that:

Events (All) = Events (Female) + Events (Male)

I also manipulated the data so that there are more than one event per day.

As a result, df looks like:

  Sex Age       Date
1   2  36 2004-01-05
2   1  47 2004-01-05
3   1  26 2004-01-10
4   2  23 2004-01-10
5   1  50 2004-01-27
6   2  35 2004-01-27
7   1  35 2004-01-30
8   1  38 2004-02-30
9   2  29 2004-02-30

The following code should achieve the desired results.

library(tidyr)
df = read.csv([Path to dataset], stringsAsFactors = FALSE) 

df %>% group_by(Date, Sex) %>%  
  summarise(sex_count = n()) %>% 
  spread(Sex,sex_count, fill=0) %>% 
  rename( event_female = '1', event_male = '2') %>%
  mutate(event_all = event_female + event_male ) %>%
  select(event_all, event_female, event_male)

output:

 Date       event_all event_female event_male
  <chr>          <dbl>        <dbl>      <dbl>
1 2004-01-05         2            1          1
2 2004-01-10         2            1          1
3 2004-01-27         2            1          1
4 2004-01-30         1            1          0
5 2004-02-30         2            1          1
```
nwaldo
  • 381
  • 2
  • 10
  • Thanks a lot. I've solved problem using this solution - https://datascience.stackexchange.com/questions/73451/how-to-calculate-events-per-day-in-r-including-dates-when-no-events-occurred . – Bartosz Dawidowski May 04 '20 at 09:01