This transformation is called min-max-scaling and also often referred to as standardization.
Scikit learn provides the MinMaxScaler()
for this (see here). Here is an example adapted from "Introduction to Machine Learning with Python" by Mueller and Guido:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target,
random_state=1)
scaler = MinMaxScaler()
scaler.fit(X_train)
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)
(Side note: keep in mind to fit the scaler only on the training and not the test data!)
In the book "Python Machine Learning" by Raschka the author provides a brief pragmatic comparison of min-max-scaling/standardization to normalization (the latter one means substracting the mean and dividing by variance):
Although normalization via min-max scaling is a commonly used technique that is useful when we need values in a bounded interval, standardization can be more practical for many machine learning algorithms. The reason is that many linear models, such as the logistic regression and SVM, [...] initialize the weights to 0 or small random values close to 0. Using standardization, we center the feature columns at mean 0 with standard deviation 1 so that the feature columns take the form of a normal distribution, which makes it easier to learn the weights. Furthermore, standardization maintains useful information about outliers and makes the algorithm less sensitive to them in contrast to min-max scaling, which scales the data to a limited range of values.
scaler.fit(X_train)
(fitting on train data) and thenX_train_scaled = scaler.transform(X_train)
(applying to train data) andX_test_scaled = scaler.transform(X_test)
(applying to test data). But there is nothing likescaler.fit(X_test)
. But nevertheless you are correct that the test data will not be scaled to have a min of 0 and a max of 1. – Jonathan Mar 05 '20 at 18:14