I have a list :
citylist = ['New York', 'San Francisco', 'Los Angeles', 'Chicago', 'Miami']
and a pandas Dataframe df1 with these values
first last city email duration
John Travis New York [email protected] 5.5
Jim Perterson San Francisco, Los Angeles [email protected] 6.8
Nancy Travis Chicago [email protected] 1.2
Jake Templeton Los Angeles [email protected] 4.9
John Myers New York [email protected] 1.9
Peter Johnson San Francisco, Chicago [email protected] 2.3
Aby Peters Los Angeles [email protected] 1.8
Amy Thomas San Francisco [email protected] 8.8
Jessica Thompson Los Angeles, Chicago, New York [email protected] 4.2
I want to count the number of times each city from citylist occurs in the dataframe column 'city' (this portion have it working, thanks to @scott-boston for answer in my prior question )
(df1['city'].str.split(', ')
.explode()
.value_counts(sort=False)
.reindex(citylist, fill_value=0))
Additionally I want to sum by column 'duration' and group by city and calculate percent (sum of duration for group)/(total duration)
city list duration %time
New York 3 11.6 0.31
San Francisco 3 17.9 0.47
Los Angeles 4 17.7 0.47
Chicago 3 7.7 0.20
Miami 0 0 0