8

How can I transform the following DataFrame into one with cities as rows and each cuisine as a column, and 1 or 0 as values (1 if the city has that kind of cuisine)?

I think this turns out to be a very common problem in transforming data into features for machine learning.

I am aware of the Pandas pivot_table functionality, but it asks for a value column, and in this case we don't have any.

import pandas as pd

data = {
    'city': ['NY','NY', 'SF','SF','SF'],
    'cuisine': ['Japanese', 'Chinese', 'French', 'Japanse', 'German']
}

df = pd.DataFrame(data)
blue-dino
  • 383
  • 2
  • 3
  • 11

3 Answers3

8

If there is no value column - introduce it yourself!

df["value"]=1
pd.pivot_table(df, values="value", index=["city"], columns="cuisine", fill_value=0) 

For your example I got (after fixing the misprint in 'Japanse' to 'Japanese')

cuisine  Chinese  French  German  Japanese
city                                      
NY             1       0       0         1
SF             0       1       1         1
lanenok
  • 1,516
  • 9
  • 9
4

Here is a more pythonic way:

df.pivot_table(index=["city"], columns="cuisine", aggfunc=lambda x: 1, fill_value=0)

This is a late answer, but it is for people who meet the same probelm.

sac7e
  • 41
  • 1
0

In recent Pandas versions there is a method for exactly this use-case:

https://pandas.pydata.org/docs/reference/api/pandas.get_dummies.html

pd.get_dummies(df.set_index(["city"]).cuisine)

Result:

      Chinese  French  German  Japanese  Japanse
city                                            
NY          0       0       0         1        0
NY          1       0       0         0        0
SF          0       1       0         0        0
SF          0       0       0         0        1
SF          0       0       1         0        0
florian
  • 131
  • 3