30

I am building a multinomial logistic regression with sklearn (LogisticRegression). But after it finishes, how can I get a p-value and confident interval of my model? It only appears that sklearn only provides coefficient and intercept.

Thank you a lot.

hminle
  • 401
  • 1
  • 4
  • 4

3 Answers3

15

The short answer is that sklearn LogisticRegression does not have a built in method to calculate p-values. Here are a few other posts that discuss solutions to this, however.

https://stackoverflow.com/questions/27928275/find-p-value-significance-in-scikit-learn-linearregression

https://stackoverflow.com/questions/22306341/python-sklearn-how-to-calculate-p-values

Hobbes
  • 1,439
  • 9
  • 15
14

One way to get confidence intervals is to bootstrap your data, say, $B$ times and fit logistic regression models $m_i$ to the dataset $B_i$ for $i = 1, 2, ..., B$. This gives you a distribution for the parameters you are estimating, from which you can find the confidence intervals.

darXider
  • 613
  • 1
  • 5
  • 12
6

This is still not implemented and not planned as it seems out of scope of sklearn, as per Github discussion #6773 and #13048.

However, the documentation on linear models now mention that (P-value estimation note):

  • It is theoretically possible to get p-values and confidence intervals for coefficients in cases of regression without penalization.
  • The statsmodels package natively supports this.
  • Within sklearn, one could use bootstrapping.

It appears that it is possible to modify the LinearRegression class to calculate p-values from linear algebra, as per this Github code.

Lucas Morin
  • 2,196
  • 5
  • 21
  • 42