1

let's suppose we have a customer data from the year 2015 to 2019, I want to train_test_split() my data such that my data gets divided into three sets, set-1 is from 2015 to 2017 (3 years) on which i will train my model, set-2 i.e. 2018(1 year) on which i will validate my model , set 3 is 2019(1 year) on which I will test my model. I want a code to divide data into 3 sets based on time(years).

karan
  • 13
  • 3
  • 1
    Welcome to DS StackExchange. Please elaborate more on the question. What are you asking exactly? As of now, it's not possible to help you. – Leevo Jan 16 '20 at 15:36
  • 1
    Hello, let's suppose we have a customer data from the year 2015 to 2019, I want to train_test_split() my data such that my data gets divided into three sets, set-1 is from 2015 to 2017 (3 years) on which i will train my model, set-2 i.e. 2018(1 year) on which i will validate my model , set 3 is 2019(1 year) on which I will test my model. I want a code to divide data into 3 sets based on time(years). – karan Jan 16 '20 at 16:21
  • Thank you! Please update the main question with this information – Leevo Jan 16 '20 at 16:24
  • Do you realize that this is a terrible idea from a machine learning perspective? The year is very likely a significant factor and you are removing 2 years completely from the learning process. – Pieter21 Jan 16 '20 at 21:10

1 Answers1

0

Seems to me the best (or at least quickest) way to do this would be have all the data in a Pandas dataframe, then create masks based on year and create new dataframes for each group. Ex:

train_df = data[data['year'].isin(['2015', '2016', '2017'])
validate_df = data[data['year'] == '2018']
test_df = data[data['year'] == '2019']

Hope this is what you're looking for. If not, let me know and we can work out another solution.

whege
  • 171
  • 4