I'm trying to find the patterns in the crime records I have in a database. I thought clustering would be a way to do it.
This is my (cooked up) dataset:
age,nationality,country_of_birth,place_of_birth,no_of_checkedinbaggage,noofcabinbaggage,no_of_co_passengers,watchlist
34,GBR,GBR,London,2,1,0,Drug Trafficker
32,IND,IND,Delhi,2,1,0,Human Trafficker
31,USA,USA,Tampa,2,1,0,Arms Dealer
.....
Basically, I'd want to identify the clusters of watchlists and see if they are having a pattern. Based on the cluster, I'd want to predict future data as well.
Is clustering (K-Means) the correct choice? And also, do all the variables have to be numeric? If so, I'm not sure how I can encode them to numerics. Thoughts?