2

image

Hello, I am trying to transfer over from excel to pandas.

I want to add new column called 'daily_volume' where if the 'project_name' is equal to the above row project_name then calculate the difference.

For example, 1,424.53 - 1,343.68 = 80.85

My goal is to see 80.85 in row 2 under the newly created 'daily_volume' column for $LONDON Gift.

Basically, the first row for a project name will always be blank.

Shayan Shafiq
  • 1,012
  • 4
  • 12
  • 24
Tom
  • 21
  • 1

1 Answers1

0

If I get you, the idea is to calculate the difference between the current total_volume and its immediately below taking into account the project_name, right?. You can group your dataframe by project_name column, select the total_volume and after that you can use .diff() method, this method make the operation you need.

  1. I'm going to use the following example data:
date total_volume project_name
0 2021-08-26 1343.68 \$LONDON Gift
1 2021-08-26 1424.53 \$LONDON Gift
2 2021-08-26 1800.10 \$LONDON Gift
3 2021-08-26 2345.23 \$GROUP_2
4 2021-08-26 2500.45 \$GROUP_2
5 2021-08-26 2567.76 \$GROUP_3

You can observe there are 3 different groups: $LONDON Gift, $GROUP_2 and $GROUP_3

  1. We create the new column daily_volume as follow:

    df['daily_volume'] = df.groupby('project_name')['total_volume'].diff()
    

Output:

date total_volume project_name daily_volume
0 2021-08-26 1343.68 \$LONDON Gift NaN
1 2021-08-26 1424.53 \$LONDON Gift 80.85
2 2021-08-26 1800.10 \$LONDON Gift 375.57
3 2021-08-26 2345.23 \$GROUP_2 NaN
4 2021-08-26 2500.45 \$GROUP_2 155.22
5 2021-08-26 2567.76 \$GROUP_3 NaN

A final recommendation: Avoid to use image as data, put the code you have in the question, something like: pd.DataFrame({...}).

Carmoreno
  • 113
  • 5