6

I want to predict count data. In my understanding both standard classification and regression are not well suited for this. A poisson or binomial regression algorithm seems to do the trick.

I am used to doing most of my ML tasks in sklearn. But on this topic I could not find an implementation. Are there any suitable options within the python universe for this?

Stephen Rauch
  • 1,783
  • 11
  • 22
  • 34
El Burro
  • 800
  • 1
  • 4
  • 12

4 Answers4

8

Not quite sklearn but have you tried xgboost?

The XGBRegressor in xgboost accepts many different objective functions including poisson count:poisson for count data.

It also plays nicely with sklearn so can be used with grid search, pipelines etc.

johnaphun
  • 181
  • 1
  • 3
  • This sounds promising - I will give it a try - though I am not sure when I will have time for this ;). – El Burro Oct 15 '18 at 06:46
  • Sorry for the late reply - i think while interesting as an objective functions this is not what I am looking for. I am looking for a regressor that predictions only return integers - and at least when I tried this one it did not do that. – El Burro Aug 02 '19 at 14:32
5

statsmodels has you covered.

There aren't a lot of great examples of Poisson regression in the statsmodels API, but if you're happy with GLMs, statsmodels has a GLM API which lets you specify any single-parameter distribution, including Poisson.

R Hill
  • 1,105
  • 10
  • 20
  • Do you know of any non-linear models that support poisson regression or other methods to predict count data? – El Burro Sep 20 '17 at 08:06
  • Can you clarify what you mean by "non-linear" in this context? – R Hill Sep 20 '17 at 16:06
  • Like a neural network, some of the many variants of decision trees, suport vector machines etc. – El Burro Sep 20 '17 at 16:16
  • 2
    Okay. Well, regular Poisson regression is the parameterisation of a Poisson distribution by a linear combination of your predictor variables, so you could replace that linear combination by any non-linear transformation you like. So you could produce a neural network, the output layer of which is a point estimate of a Poisson process. This would, however, be a lot more complicated than regular GLM Poisson regression, and a lot harder to diagnose or interpret. It's probably worth trying a standard Poisson regression first to see if that suits your needs. – R Hill Sep 20 '17 at 16:23
  • 1
    You probably need to write down a loss function which is equal to the negative log likehood of the poisson distribution. – Kota Mori Nov 14 '18 at 22:56
0

You can use PoissonRegressor or even RandomForestRegressor in sklearn. I think you can use common other regressor too, it is not problem, it is base on your evaluation metrics.

Mahdi Fa
  • 1
  • 1