I am trying to read a large log file, which has been parsed using different delimiters (legacy changes).
This code works
import os, subprocess, time, re
import pandas as pd
for root, dirs, files in os.walk('.', topdown=True):
for file in files:
df = pd.read_csv(file, sep='[,|;: \t]+', header=None, engine='python', skipinitialspace=True)
for index, row in df.iterrows():
print(row[0], row[1])
This works well for the following data
[email protected] address1
[email protected];address2
[email protected],address3
[email protected];;address4
[email protected],,address5
Issue #1: the following row in the input file will break the code. I wish for this to be parsed into 2 columns (not 3)
[email protected],,address;6
Issue #2: I wish to replace all single and double quotes in address, but neither of the following seem to work.
df[1]=df[1].str.replace('"','DQUOTES')
df.replace('"', 'DQUOTES', regex=True)
Pls help!