I am looking for an example of how to use train_test_split with an existing dataset. I have a CSV that can be bought into a dataset with:
data = pd.read_csv('c:\MyData.csv')
My aim is to use this data with a One-Class SVM. When I look at examples of using train_test_split though they all seem to want to generate a random dataset and then use that. This is usually done with the X, y =
followed by the parameters you want to give the data.
Looking at the sklearn webpage you then use:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
This would give you a test size of 33%. How do you specify that the X, y =
is data
or am I barking up completely the wrong tree and not thinking about this correctly?
Any help as always is greatly appreciated.