0

CSV data format

1st Format

name,email,mobile,email
a,[email protected],1234567890,[email protected]

2nd Format
name,email,"mobile,number",email
a,[email protected],1234567890,[email protected]

3rd Format
name,email,"mobile number",email
a,[email protected],1234567890,[email protected]

In my above data format email is duplicate in header so I only want to keep first column value for email and second email data with header should be deleted from file.

I have tied this but it;s not working properly

awk  -F'","' 'NR==1{for(i=1;i<=NF;i++)if(!($i in v)){ v[$i];t[i]}}{s=""; for(i=1;i<=NF;i++)if(i in t)s=s sprintf("%s,",$i);if(s){sub(/,$/,"",s);print s}} ' input.csv > output.csv

Please suggest script command for the same

user13000875
  • 387
  • 2
  • 14

2 Answers2

0

Awk is probably not the most practical tool for manipulating CSV files. There are many others.

Here are a few examples, with your data

  • csvtool (sudo apt install csvtool)
$ for f in [123].csv; do echo $f; csvtool col 1-3 "$f"; echo; done
1.csv
name,email,mobile
a,[email protected],1234567890

2.csv
name,email,"mobile,number"
a,[email protected],1234567890

3.csv
name,email,mobile number
a,[email protected],1234567890
  • csvcut (sudo apt install csvkit)
$ for f in [123].csv; do echo $f; csvcut -C 4 "$f"; echo; done
1.csv
name,email,mobile
a,[email protected],1234567890

2.csv
name,email,"mobile,number"
a,[email protected],1234567890

3.csv
name,email,mobile number
a,[email protected],1234567890
  • Perl's Text::CSV (sudo apt install libtext-csv-perl)
    (This would probably be better suited for more complex needs, and should be in a more readable script file)
$ for f in [123].csv; do echo $f; perl -MText::CSV -lne 'BEGIN{$csv=Text::CSV_XS->new()} if ($csv->parse($_)) {$csv->print(*STDOUT, [ ($csv->fields)[0..2] ]);}' "$f"; echo; done
1.csv
name,email,mobile
a,[email protected],1234567890

2.csv
name,email,mobile,number
a,[email protected],1234567890

3.csv
name,email,mobile number
a,[email protected],1234567890
mivk
  • 13,452
  • 5
  • 76
  • 69
0

If your CSV is well-formed, try

sed 's/^\("\([^"]|""\)*"|\[^",]*\),\("\([^"]|""\)*"|\[^",]*\),\("\([^"]|""\)*"|\[^",]*\),\("\([^"]|""\)*"|\[^",]*\)$/\1,\3,\5/'

Demo: https://ideone.com/7xKlGU

The regex isn't particularly elegant but should work straightforwardly. "\([^"]\|""\)*" matches a quoted field and [^",]* matches a field which isn't quoted. This assumes that either a field in its entirety is quoted, or not at all, and that the escaping mechanism is doubling the double quotes which should be literal, as is the convention in most common CSV dialects.

tripleee
  • 175,061
  • 34
  • 275
  • 318