0

I have tab-delimited files with this kind of table:

mytext <- "colA colB colС colD
ID1 valB1 valC1
ID2 valB2 valC2
ID3 valB3 valC3 valD3
ID4 valB4 valC4 valD4"

First line contains the header with fixed number of fields, and next lines contain not more than this number of fields, but sometimes less, i.e. last fields can be missing. In read.csv(), I can just add fill=TRUE and it works perfectly, but fread() does not have this option. Even if I explicitly indicate, for example, header=TRUE, skip='colA' - fread() skips first lines including even the header; it makes line 'ID3' a header. What is recommended solution for this? I would like to avoid double conversion data.table(read.csv(...)) as it's quite slow.

Vasily A
  • 8,256
  • 10
  • 42
  • 76
  • 2
    fix your file - e.g. in linux smth like `fread("awk 'BEGIN {OFS=\",\"} NR == 1 {ncol = NF} {for (i = NF+1; i <= ncol; i++) $i = \"\"; print}' yourfile")` – eddi Dec 09 '15 at 23:03
  • well, unfortunately I have to run my script under Windows. I'll have to install something like GnuWin, and processing thousands of files I have will probably take a while... But I will keep in mind this solution if nothing else appear, thanks for your help! – Vasily A Dec 10 '15 at 02:13
  • The `fill` option is now available in the development version of "data.table". Install using `install.packages("data.table", repos = "https://Rdatatable.github.io/data.table", type = "source")` and use `fread(..., fill = TRUE)` to get the behavior you want. – A5C1D2H2I1M1N2O1R2T1 Mar 09 '16 at 11:02
  • thanks Ananda! now it works indeed – Vasily A Mar 10 '16 at 04:50

0 Answers0