Questions tagged [scikit-learn]

What is scikit-learn?

scikit-learn is a popular machine learning package for Python that has simple and efficient tools for predictive data analysis. Topics include classification, regression, clustering, dimensionality reduction, model selection, and preprocessing. It is built upon NumPy, SciPy, and matplotlib and is open-sourced under the BSD License. It is part of the scientific computation ecosystem and useful for both individual and commercial use.

New to scikit-learn?

There are various resources including books, tutorials/workshops, etc. for those looking to learn how to use scikit-learn.

A popular introductory tutorial is:

SciPy 2018 Conference Tutorial:

A popular introductory book is:

Introduction to Machine Learning with Python, by Andreas C. Müller and Sarah Guido.

scikit-learn Tag usage

When posting questions about scikit-learn, please take the following into consideration:

When tagging questions with the scikit-learn tag, users should not use the tag sklearn, despite semantic similarity, as the latter is marked as a synonym and will automatically be retagged.
Explicit programming related questions are more suitable for Stack Overflow and should not be posted on Stack Exchange Data Science.
Questions should include sufficient details and clarity to be able to provide support for the problem at hand. This includes linking to underlying data used, providing code used for the model's construction, highlighting relevant outputs, etc.

External Resources

scikit-learn: Documentation page

scikit-learn: GitHub page

Important links

HTML documentation (development version): http://scikit-learn.org/dev/
Download releases: http://sourceforge.net/projects/scikit-learn/files/
Issue tracker: https://github.com/scikit-learn/scikit-learn/issues
Mailing list: https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

2308 questions

votes

4 answers

Poisson regression options in python

I want to predict count data. In my understanding both standard classification and regression are not well suited for this. A poisson or binomial regression algorithm seems to do the trick. I am used to doing most of my ML tasks in sklearn. But on…

scikit-learn

asked Sep 19 '17 at 15:43

El Burro

votes

1 answer

Scikit learn: which regressors natively support multi-target regression?

The docs on sklearn.multioutput.MultiOutputRegressor state that it implements a strategy for extending regressors that do not natively support multi-target regression. I'm interested to know: which ones do natively support multi-target regression ?…

scikit-learn

asked Dec 20 '17 at 00:36

Нет войне

votes

1 answer

Expected 2D array, got scalar array instead: array=11

import numpy as np import matplotlib.pyplot as plt import pandas as pd from sklearn.metrics import r2_score # veri yukleme veriler = pd.read_csv(r'C:\Users\k\Desktop\maaslar_yeni.csv') # x burada bagımsız degısken y ise bagımlı degiskendir. x =…

scikit-learn

asked Dec 08 '19 at 16:15

user86600

votes

1 answer

Sci-kit Pipeline and GridsearchCV returns indexError: too many indices for array

I'm trying to get to grips with sci-kit learn for some simple machine learning projects but I'm coming unstuck with Pipelines and wonder what I've done wrong... I'm trying to work through a tutorial on Kaggle Here's my code: import pandas as…

scikit-learn

asked Dec 16 '14 at 01:19

elksie5000

votes

1 answer

Scikitlearn - TfidfVectorizer - how to use a custom analyzer AND still use token_pattern

The docs state that token_pattern is only used if analyzer == 'word': token_pattern : string Regular expression denoting what constitutes a “token”, only used if analyzer == 'word'. The default regexp selects tokens of 2 or more …

scikit-learn

asked Mar 22 '18 at 17:51

aweeeezy

votes

1 answer

sklearn CountVectorizer token_pattern -- skip token if pattern match

I apologize if this question is misplaced -- I'm not sure if this is more of a re question or a CountVectorizer question. I'm trying to exclude any would be token that has one or more numbers in it. >>> from sklearn.feature_extraction.text import…

scikit-learn

asked Mar 21 '18 at 07:18

aweeeezy

votes

2 answers

Large sparse dataset in Catboost

I have a large sparse data matrix (bag of words, over large number of entries). I can easily treat it as a sparse matrix in sklearn models such as RandomForest. But, if I want to use Catboost, I need to turn it into a dense matrix. I was wondering…

scikit-learn

asked Oct 31 '17 at 23:37

Mojtaba Komeili

votes

1 answer

How does scikit-learn decision function method work?

The scikit-learn docs say it is the signed distance of that sample to the hyperplane. I've taken the sum of the weights and their corresponding coefficient and added the intercept to that sum but this does not return the value given by the…

scikit-learn

asked May 08 '17 at 23:55

berrypy

votes

1 answer

How many features do you generally use for your ML Model?

I am working on a certain kaggle competition and users there say that they are using >5000 features and training a XGBoost or Random Forest on it. The mentioned post is here:…

scikit-learn

asked Nov 14 '15 at 05:51

Rahul Agarwal

votes

1 answer

Criteria used to create and select leaf nodes in sklearn

I just want to know the details of what (and how) is the criteria used by sklearn.tree.DecisionTreeClassifier to create leaf nodes. I know that the parameters criterion{“gini”, “entropy”}, default=”gini” and splitter{“best”, “random”},…

scikit-learn

asked Jul 29 '20 at 20:45

Ivan

votes

2 answers

Is there a documentation where it is explained why scikit-learn does not provide p-values?

Is there a documentation, paper etc. where it is explained why scikit-learn does not provide p-values/confidence levels (1, 2, 3, 4)? Note: I'm not asking about opinions, but about documentation. For example the R package lme4 does not provide…

scikit-learn

asked Jun 19 '19 at 12:57

Qaswed

votes

2 answers

Why don't all feature selection methods in sklearn allow specifying desired variance explained?

Why don't all feature selection methods in sklearn allow specifying desired variance explained? sklearn.decomposition.PCA does allow inputting a percentage of variance that one wants to be explained in place of n_components. However other methods…

scikit-learn

asked May 10 '18 at 16:46

mavavilj

votes

1 answer

Module 'sklearn' has no attribute 'datasets'?

Isn't scikit-learn version 1.0.2 supposed to have an attribute datasets? If so, why am I getting an error? Python 3.9.2 (default, Feb 28 2021, 17:03:44) [GCC 10.2.1 20210110] on linux Type "help", "copyright", "credits" or "license" for more…

scikit-learn

asked Feb 16 '22 at 09:28

Tfovid

votes

0 answers

Multidimensional scaling (MDS) fails on a simple example

I want to apply multi-dimensional scaling (MDS) on specific objects; using the Euclidean distance does not make sense for such objects; using another distance metric, I can compute their dissimilarity matrix $D$. Then I compute the embeddings of the…

scikit-learn

asked Sep 30 '21 at 17:57

user11634

vote

1 answer

Can I add new features in an existing dataset using function transformers in scikit-learn

I have written a code that can add 3 new columns into a NumPy array, using function transformer(1 st column is element-wise +, 2nd is element-wise *, 3rd is element-wise /. Just need to know if in this way I can add new features to an existing…

scikit-learn

asked Jun 21 '21 at 08:48

Manprit Singh

2 3 4 Next