I am working on this dataset looks very similar like below where,
transaction_id customer_id phone email
1 19 12345 [email protected]
2 19 00001 [email protected]
3 Guest 00001 [email protected]
4 22 12345 [email protected]
5 23 78900 [email protected]
The customers under 19, Guest and 22 are actually the same, according to the similar info used in columns phone and email.
As long as, the customer ids for the customer are not unique, my goal is to find similar rows and assign a new unique customer id (to create a new unique customer_id column).
trans_id cust_id phone email unique_id
1 19 12345 [email protected] 1
2 19 00001 [email protected] 1
3 Guest 00001 [email protected] 1
4 22 12345 [email protected] 1
5 23 78900 [email protected] 2
The complicated side is, I can groupby email, or I can groupby email and phone. But I couldn’t grasp all rows, for example transaction number 2 is always being assigned as other unique customer id. I tried this code.
df['unique_id'] = df.groupby(‘phone’).grouper.group_info[0]
I greatly appreciate your time and help.