I have a following dataframe:
URL_WITH_EMAILS_DF = pd.DataFrame(data=[{'main_url': 'http://keilstruplund.dk', 'emails': ['[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]','[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]']},
{'main_url': 'http://kirsebaergaarden.com', 'emails': ['[email protected]','[email protected]']},
{'main_url': 'http://koglernes.dk', 'emails': ['[email protected]']},
{'main_url': 'http://kongehojensbornehave.dk', 'emails': []}
])
However, I want to keep only those values for property named "emails" whose every element's value after '@' is same as the corresponding value of the 'main_url' property but after "http://" resulting the following data frame:
URL_WITH_EMAILS_DF = pd.DataFrame(data=[{'main_url': 'http://keilstruplund.dk', 'emails': ['[email protected]']},
{'main_url': 'http://kirsebaergaarden.com', 'emails': ['[email protected]']},
{'main_url': 'http://koglernes.dk', 'emails': ['[email protected]']},
{'main_url': 'http://kongehojensbornehave.dk', 'emails': []}
])
Any hints or approach is appreciable considering the fact that I have millions row to implement the transformation