31

Suppose I have a 5*3 data frame in which third column contains missing value

1 2 3
4 5 NaN
7 8 9
3 2 NaN
5 6 NaN

I hope to generate value for missing value based rule that first product second column

1 2 3
4 5 20 <--4*5
7 8 9
3 2 6 <-- 3*2
5 6 30 <-- 5*6

How can I do it use data frame? Thanks.

How to add condition to calculate missing value like this?

if 1st % 2 == 0 then 3rd = 1st * 2nd else 3rd = 1st + 2nd

1 2 3
4 5 20 <-- 4*5 because 4%2==0
7 8 9
3 2 5 <-- 3+2 because 3%2==1
5 6 11 <-- 5+6 because 5%2==1
KyL
  • 429
  • 1
  • 4
  • 5

6 Answers6

33

Assuming three columns of your dataframe is a, b and c. This is what you want:

df['c'] = df.apply(
    lambda row: row['a']*row['b'] if np.isnan(row['c']) else row['c'],
    axis=1
)

Full code:

df = pd.DataFrame(
    np.array([[1, 2, 3], [4, 5, np.nan], [7, 8, 9], [3, 2, np.nan], [5, 6, np.nan]]), 
    columns=['a', 'b', 'c']
)
df['c'] = df.apply(
    lambda row: row['a']*row['b'] if np.isnan(row['c']) else row['c'],
    axis=1
)
Icyblade
  • 4,326
  • 1
  • 24
  • 34
  • A few years late but this only works when the columns are numeric. np.isnan does not support non-numeric data. It's not an issue here as the OP had numeric columns and arithmetic operations but otherwise pd.isnull is a better alternative. – Adarsh Chavakula Jan 03 '20 at 21:50
14

What about using the fillna() method of the dataframe?

df['C'].fillna(df.A * df.B)
Ethan
  • 1,633
  • 9
  • 24
  • 39
Y K
  • 253
  • 2
  • 6
6

Another option:

df.loc[(pd.isnull(df.C)), 'C'] = df.A * df.B
Ethan
  • 1,633
  • 9
  • 24
  • 39
Vishal
  • 268
  • 2
  • 5
5

Assuming that the three columns in your dataframe are a, b and c. Then you can do the required operation like this:

values = df['a'] * df['b']
df['c'] = values.where(df['c'] == np.nan, others=df['c'])
enterML
  • 3,031
  • 9
  • 27
  • 38
0
for i in df["a"]:
if i%2==0:
    df["c"].fillna(value=df["a"]*df["b"])
else:
    df["c"].fillna(value=df["a"]+df["b"])

A simple solution for beginners.

0

In the first case you can simply use fillna:

df['c'] = df.c.fillna(df.a * df.b)

In the second case you need to create a temporary column:

df['temp'] = np.where(df.a % 2 == 0, df.a * df.b, df.a + df.b)
df['c'] = df.c.fillna(df.temp)
df.drop('temp', axis=1, inplace=True)
Mykola Zotko
  • 57
  • 2
  • 14