77

I am trying to convert a list of lists which looks like the following into a Pandas Dataframe

[['New York Yankees ', '"Acevedo Juan"  ', 900000, ' Pitcher\n'], 
['New York Yankees ', '"Anderson Jason"', 300000, ' Pitcher\n'], 
['New York Yankees ', '"Clemens Roger" ', 10100000, ' Pitcher\n'], 
['New York Yankees ', '"Contreras Jose"', 5500000, ' Pitcher\n']]

I am basically trying to convert each item in the array into a pandas data frame which has four columns. What would be the best approach to this as pd.Dataframe does not quite give me what I am looking for.

Emre
  • 10,491
  • 1
  • 29
  • 39
Aravind Veluchamy
  • 871
  • 1
  • 6
  • 3

4 Answers4

89
import pandas as pd

data = [['New York Yankees', 'Acevedo Juan', 900000, 'Pitcher'], 
        ['New York Yankees', 'Anderson Jason', 300000, 'Pitcher'], 
        ['New York Yankees', 'Clemens Roger', 10100000, 'Pitcher'], 
        ['New York Yankees', 'Contreras Jose', 5500000, 'Pitcher']]

df = pd.DataFrame.from_records(data)
Emre
  • 10,491
  • 1
  • 29
  • 39
  • 17
    You could refine it a bit more with:

    DataFrame.from_records(data, columns=['Team', 'Player', 'whatever-stat-is-that', 'position'])

    – Juan Ignacio Gil Jan 11 '18 at 10:14
  • 1
    Is there a way to specify the imports more specifically? E.g. I want to specify that DataFrame["Team"] must refer to the first item of each sublist (i.e. data[i][0]) and DataFrame["Position"] to refer to the last item of each sublist (i.e. data[i][-1])? – Ivo Jan 17 '19 at 15:20
  • @Ivo: Use the columns parameter of DataFrame.from_records. – Emre Jan 17 '19 at 21:27
22

Once you have the data:

import pandas as pd

data = [['New York Yankees ', '"Acevedo Juan"  ', 900000, ' Pitcher\n'], 
        ['New York Yankees ', '"Anderson Jason"', 300000, ' Pitcher\n'], 
        ['New York Yankees ', '"Clemens Roger" ', 10100000, ' Pitcher\n'], 
        ['New York Yankees ', '"Contreras Jose"', 5500000, ' Pitcher\n']]

You can create dataframe from the transposing the data:

data_transposed = zip(data)
df = pd.DataFrame(data_transposed, columns=["Team", "Player", "Salary", "Role"])

Another way:

df = pd.DataFrame(data)
df = df.transpose()
df.columns = ["Team", "Player", "Salary", "Role"]
Paloma Manzano
  • 221
  • 2
  • 2
  • Your transposition example fails for me with "ValueError: 4 columns passed, passed data had 1 columns". Why the transposition anyway when this works: df = pd.DataFrame(data, columns=["Team", "Player", "Salary", "Role"]). – Timothy C. Quinn Sep 06 '22 at 12:43
7

You can just directly define it as a data frame as follows:

import pandas as pd

data = [['New York Yankees', 'Acevedo Juan', 900000, 'Pitcher'], 
        ['New York Yankees', 'Anderson Jason', 300000, 'Pitcher'], 
        ['New York Yankees', 'Clemens Roger', 10100000, 'Pitcher'], 
        ['New York Yankees', 'Contreras Jose', 5500000, 'Pitcher']]

data = pd.DataFrame(data)
LUSAQX
  • 783
  • 2
  • 10
  • 24
1

This one by far was the simplest:

import pandas as pd

data = [['New York Yankees', 'Acevedo Juan', 900000, 'Pitcher'], 
        ['New York Yankees', 'Anderson Jason', 300000, 'Pitcher'], 
        ['New York Yankees', 'Clemens Roger', 10100000, 'Pitcher'], 
        ['New York Yankees', 'Contreras Jose', 5500000, 'Pitcher']]

data = pd.DataFrame(data)

now, if the keys are the first list in the list of lists (data[0]), you can assign them to column headers in the dataframe like so:

import pandas as pd

data = [['key1', 'key2', key3, 'key4'], 
    ['New York Yankees', 'Anderson Jason', 300000, 'Pitcher'], 
    ['New York Yankees', 'Clemens Roger', 10100000, 'Pitcher'], 
    ['New York Yankees', 'Contreras Jose', 5500000, 'Pitcher']]

data = pd.DataFrame(data[1:], columns=data[0])
GManAsg
  • 11
  • 1