I have to check column in csv to find valid emails and keep them while removing invalid data from that column. I already have an AWK command with simple regex but some of the invalid emails are not filtered with that. Below is that command
awk 'BEGIN{FS=OFS=","}{$1=match($1,/[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}/)?substr($1,RSTART,RLENGTH):"";print}'
But I want to replace this regex pattern with RFC 5322 compliant regex. I found following regex but it doesn't work when I add it to above awk command. How can I insert this regex pattern to above AWK command?
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
Csv sample is below
[email protected],abd
[email protected],534
[email protected],5rfrf
[email protected],54rf
[email protected],54r4
[email protected],5443
[email protected],344545
[email protected],64
[email protected],54444
I tried below command
awk 'BEGIN{FS=OFS=","}{$1=match($1,/[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}/)?substr($1,RSTART,RLENGTH):"";print}'
Expetected output
[email protected],abd
[email protected],534
,5rfrf
[email protected],54rf
,54r4
[email protected],5443
,344545
,64
[email protected],54444
john@,4355
([email protected],[email protected],[email protected],[email protected],john@) are not valid emails and they are removed)