0

Hi i'm working on a practice project on amazon movie reviews and i did everything that was asked except building a model that will recommend movies to users which have not been watched nor been rated. but when i'm trying to train and split the model I have a problem on assigning x/features and y/target as user_id's are not numeric e.g 'A3R5OBKS7OM2IR'. (The data contains 207 columns with user_id column which contains 4.9k users and 206 movies(movie 1 to movie 206 with ratings corresponding with the respected user_id)

Brian Spiering
  • 21,136
  • 2
  • 26
  • 109
Yaya
  • 1

2 Answers2

1

I'm not sure why you would include user ID as a predictive variable, unless there is specifically some part of the user ID that gives you information. For example, maybe the first three letters tell you that the user is a premium user and then you could feature engineer a category from that.

Victor Ng
  • 370
  • 1
  • 7
0

If you want to assign any non-numeric value to a numeric value, you'll can hash them.

In Python, it would look like:


user_id = "A3R5OBKS7OM2IR"

user_id_numeric = hash(user_id)

Brian Spiering
  • 21,136
  • 2
  • 26
  • 109