Should I apply the same data transformations in production for my classification model's inference steps

Question

I am now moving my best classification model to production and doing tests currently.

Should I use the same scaler() I used in training during my inference in production?

Also, what should I do if I used SMOTE during training? Should I also apply SMOTE to my new incoming data in production?

Thanks

Please consider upvoting/accepting the answer if it helped. – Adam Jaamour Nov 27 '23 at 17:42 — Adam Jaamour, Nov 27 '23 at 17:42

score 1 · Answer 1 · answered Nov 21 '23 at 13:48

Should I use the same scaler() I used in training during my inference?

Yes, the data that is coming into your model in production should go through the same transformations as the data you used to train your model. This ensures consistency with what your model expects (e.g., normalisation, missing values handlings, scaling, encoding, etc.) Your model learned the patterns found in the transformed data, so not applying those transformations would result in very different patterns in production that your model will likely not pick-up, therefore reducing performance.

This older question touches on this: Should we apply transformation to test and new data?.

Also, what should I do if I used SMOTE during training? Should I also apply SMOTE to my new incoming data in production?

SMOTE is a technique used to oversample datasets that have some form of imbalance. It works by generating synthetic examples of the minority class to address the imbalance. Therefore, it is only used during training better fit the model to the minority classes' data. Your model has learned the patterns from the minority classes, and therefore you don't need to apply SMOTE to your incoming data in production, only any preprocessing transformations mentioned above.

I recommend giving this page a read: SMOTE for Imbalanced Classification with Python.

Thanks, very helpful. Shame that I cannot upvote yet because of my reputation status. — MAA, Nov 21 '23 at 15:56

Should I apply the same data transformations in production for my classification model's inference steps

1 Answers1