4

I'm using a random forest model. One of my independent variables almost certainly has a parabolic effect on the dependent variable. In a linear regression, I would include the variable as both linear and squared in order to capture this effect. Should I do the same in a random forest?

By extension, same question for variables that have logistic effects (age, for example)?

Jesse
  • 41
  • 1

1 Answers1

1

Random Forest uses a ensemble of Decision Trees using CART algorithm.

Since CARTs (Classification And Regression Tree) are a non-parametric algorithm, they should be able to find interactions between variables and non-linear behaviors.

Nevertheless, building polynomials can help them have a better performance.

Carlos Mougan
  • 6,252
  • 2
  • 18
  • 48
  • 1
    +1; As an example of the last statement, see the notebook linked in https://datascience.stackexchange.com/a/61279/55122 – Ben Reiniger Mar 25 '20 at 02:32