Here is the example data set
id firstname lastname email update date
A1 wendy smith [email protected] 2018-01-02
A1 wendy smith [email protected] 2019-02-03
A2 harry lynn [email protected] 2016-04-03
A2 harry [email protected] 2019-03-12
A3 tinna dickey [email protected] 2016-04-03
A3 tinna dickey [email protected] 2013-06-12
A4 Tom Lee [email protected] 2012-06-12
A5 Ella [email protected] 2019-07-12
A6 Ben Lang [email protected] 2019-03-12
I have sorted the data set by id
and update date
, I want to merge the rows with same id
, if one row with empty value, fill the other one with same id
, if confilct, use the latest one. For rows with no duplicate id
leave the empty cell as it is.
the output should be:
id firstname lastname email update date
A1 wendy smith [email protected] 2019-02-03
A2 harry lynn [email protected] 2019-03-12
A3 tinna dickey [email protected] 2019-03-12
A4 Tom Lee [email protected] 2012-06-12
A5 Ella [email protected] 2019-07-12
A6 Ben Lang [email protected] 2019-03-12
my attempt was using ffill()
to merge rows with empty and keep last duplicate, but the result seems to affect other cells which should have empty values(like lastname in A5 should be empty ).
df=df.ffill().drop_duplicates('id',keep='last')