1

I'm still a beginner in machine learning and I wanted to do a simple test today with scikit-learn and I couldn't achieved what I expected. I created a dataset in which one column are natural numbers and the other column is whether the number is even or odd, I mean, only 0 or 1. Then I tried several different algorithms and I couldn't predict correctly the output for numbers different than those in the dataset. My question would be this: which algorithms could be useful for this purpose??? I thought for instance Logistic Regression could do the trick or KNN classifier, I'm starting to learn Machine Learning and felt very disappointed that this simple thing didn't work...

Best regards

  • Related: https://datascience.stackexchange.com/q/71678/64377. A way to understand why this doesn't work is this: you're arbitrarily deciding that these numbers belong to different classes depending on their parity, but there's no meaning about it. Usually a numerical value has a meaning related to low vs. high values (e.g. a price), and that's how learning algorithms interpret it. – Erwan May 11 '20 at 12:43

1 Answers1

1

A fundamentally linear model like logistic regression will never work well, because its assumptions are not at all true for your data set. It presumes that probability (OK, really, log odds) of being positive or negative changes linearly in each input, but, it alternates with each integer value in your input. KNN's assumption likewise does not match. For each integer, its neighbors have an opposite classification. These will never work well as applied to your input, because your input doesn't match their usage.

If however you include the parity of the input (X mod 2) as a feature, they should all trivially learn this model.

This is a pretty good simple lesson in feature engineering and understanding the assumptions you take on when applying an algorithm.

Sean Owen
  • 6,595
  • 6
  • 31
  • 43