How to remove duplicate values from the column with pandas

Question

Had a dataset like :

mail id          score
[email protected]     10
[email protected]     13
[email protected]     16
[email protected]     20
[email protected]     19
[email protected]     24

From the above data, have to remove duplicate values by comparing the score column.

Eg: In mail column we have 2 [email protected] and [email protected]. Here, we need remove duplicate values by comparing there score.

For [email protected] had score 10 & 16 then it should return the greate value row.

output:

mail id          score
[email protected]     16
[email protected]     20
[email protected]     19
[email protected]     24

Anurag Dabas · Answer 1 · 2021-04-20T12:26:36.283

1

Use sort_values() method and drop_duplicates() method:

resultdf=df.sort_values('score',ascending=False).drop_duplicates('mail id')

OR

You can also do this by groupby() method:

resultdf=df.groupby('mail id')['score'].nlargest(1).droplevel(1).reset_index()

edited Apr 20 '21 at 12:26

answered Apr 20 '21 at 12:20

Anurag Dabas

23,866
9
21
41

3

yop, like https://stackoverflow.com/a/40629420/2901002 – jezrael Apr 20 '21 at 12:21
1

only difference is drop_duplicates by 2 columns – jezrael Apr 20 '21 at 12:22
ohh...didn't noticed Thnx again @jezrael **:)** – Anurag Dabas Apr 20 '21 at 12:30
ya, best remove, but it is up to you – jezrael Apr 20 '21 at 12:30

How to remove duplicate values from the column with pandas

1 Answers1