1

I am now moving my best classification model to production and doing tests currently.

Should I use the same scaler() I used in training during my inference in production?

Also, what should I do if I used SMOTE during training? Should I also apply SMOTE to my new incoming data in production?

Thanks

Adam Jaamour
  • 279
  • 1
  • 11
MAA
  • 11
  • 3

1 Answers1

1

Should I use the same scaler() I used in training during my inference?

Yes, the data that is coming into your model in production should go through the same transformations as the data you used to train your model. This ensures consistency with what your model expects (e.g., normalisation, missing values handlings, scaling, encoding, etc.) Your model learned the patterns found in the transformed data, so not applying those transformations would result in very different patterns in production that your model will likely not pick-up, therefore reducing performance.

This older question touches on this: Should we apply transformation to test and new data?.


Also, what should I do if I used SMOTE during training? Should I also apply SMOTE to my new incoming data in production?

SMOTE is a technique used to oversample datasets that have some form of imbalance. It works by generating synthetic examples of the minority class to address the imbalance. Therefore, it is only used during training better fit the model to the minority classes' data. Your model has learned the patterns from the minority classes, and therefore you don't need to apply SMOTE to your incoming data in production, only any preprocessing transformations mentioned above.

I recommend giving this page a read: SMOTE for Imbalanced Classification with Python.

Adam Jaamour
  • 279
  • 1
  • 11
  • Thanks, very helpful. Shame that I cannot upvote yet because of my reputation status. – MAA Nov 21 '23 at 15:56