I am new to Machine Learning. I want to develop Curriculum Vitae recommender system. I want to determine how similar 2 CVs are, and given a random CV, it suggest which cluster of CVs it belongs to?
This is what I've already done, following a blog post:
I have a folder containing lot of CVs or resume text documents in plain text format (.txt).
I have done pre-processing on this data, like tokenization, stop words removal, stemming.
I extracted the Candidate's name, email-id, contact number, education and experience.
I am confused with how to train the data and how do I create a model for that. More specifically, I have the following questions:
Now how to create a model on text data?
Which algorithm I should apply on this data?
Please anyone answer. Your help will be appreciated.
Thanks.