6

I have a large amount of Data where I have to count meassurments per one ID. What I already did was creating a Data Frame over all Files and I omited the NAs. This part works properly. I was wondering if the nrow-function is the right function to solve this but I figured out that this will not lead me to the target as it returns a single number as output.

What I am looking for is if you have entries like that:

1155 2010-05-02  2.7200    1
1156 2010-05-05  2.6000    3
1157 2010-05-08  2.6700    1
1158 2010-05-11  3.5700    2

That I get a list:

ID          Number of observations
1           2
2           1
3           1
Cassandra
  • 71
  • 1
  • 1
  • 3

6 Answers6

5

Using the data.table structure (see the wiki),

library(data.table)
D <- data.table(x = c(1155, 1156, 1157, 1158),
                date = as.Date(c("2010-05-02", "2010-05-05", "2010-05-08", "2010-05-11")),
                y = c(2.7200, 2.6000, 2.6700, 3.5700),
                id = c(1, 3, 1, 2))
counts <- D[, .(rowCount = .N), by = id]
counts

This will return

counts
##    id rowCount
## 1:  1        2
## 2:  3        1
## 3:  2        1
Stereo
  • 1,413
  • 9
  • 24
BChan
  • 131
  • 1
  • 4
3

Another way is simply with the "table" function.

ids<-c(1,3,1,2)
counts<-data.frame(table(ids))
counts
1

OK if I understood correctly you can do something like:

df$observations <- rep(1, nrow(df))
df <- df[ ,-file_name_column]
new_data <- data.frame(aggregate(df, by= ID, FUN=sum))

Caution: this might not work exactly since I am not sure what you data frame looks like.

1

aggregate() should work, as the previous answer suggests. Another option is with the plyr package:

count(yourDF,c('id'))

Using more columns in the vector with 'id' will subdivide the count.

I believe ddply() (also part of plyr) has a summarize argument which can also do this, similar to aggregate().

1

This is similar to Jeremy's but using dplyr:

library(dplyr)
mytable <-
"a    date        b         id
 1155 2010-05-02  2.7200    1
 1156 2010-05-05  2.6000    3
 1157 2010-05-08  2.6700    1
 1158 2010-05-11  3.5700    2"

mytable <- read.delim(textConnection(mytable), header=TRUE,  sep="")
mytable %>% count(id)
0

Function rle is also great to do that if you don't want to download dplyr:

rle(as.vector(mytable$id))
rle(as.vector(mytable$id))$lengths
Stephen Rauch
  • 1,783
  • 11
  • 22
  • 34