Highest Voted Questions - Data Science Stack Exchange

147

votes

13 answers

Why do people prefer Pandas to SQL?

I've been using SQL since 1996, so I may be biased. I've used MySQL and SQLite 3 extensively, but have also used Microsoft SQL Server and Oracle. The vast majority of the operations I've seen done with Pandas can be done more easily with SQL. This…

asked Jul 12 '18 at 09:25

vy32

601
3
7
10

132

votes

1 answer

How to get correlation between two categorical variable and a categorical variable and continuous variable?

I am building a regression model and I need to calculate the below to check for correlations Correlation between 2 Multi level categorical variables Correlation between a Multi level categorical variable and continuous variable VIF(variance…

asked Aug 03 '14 at 13:07

GeorgeOfTheRF

2,028
5
17
20

127

votes

14 answers

Python vs R for machine learning

I'm just starting to develop a machine learning application for academic purposes. I'm currently using R and training myself in it. However, in a lot of places, I have seen people using Python. What are people using in academia and industry, and…

asked Jun 12 '14 at 06:04

user721

159
2
3
3

124

votes

2 answers

Training an RNN with examples of different lengths in Keras

I am trying to get started learning about RNNs and I'm using Keras. I understand the basic premise of vanilla RNN and LSTM layers, but I'm having trouble understanding a certain technical point for training. In the keras documentation, it says the…

asked Jan 06 '18 at 23:41

Tac-Tics

1,360
2
9
6

117

votes

12 answers

SVM using scikit learn runs endlessly and never completes execution

I am trying to run SVR using scikit-learn (python) on a training dataset that has 595605 rows and 5 columns (features) while the test dataset has 397070 rows. The data has been pre-processed and regularized. I am able to successfully run the test…

asked Aug 18 '14 at 10:46

tejaskhot

4,065
7
20
18

115

votes

5 answers

Why do cost functions use the square error?

I'm just getting started with some machine learning, and until now I have been dealing with linear regression over one variable. I have learnt that there is a hypothesis, which is: $h_\theta(x)=\theta_0+\theta_1x$ To find out good values for the…

asked Feb 10 '16 at 21:52

Golo Roden

1,323
2
9
6

114

votes

11 answers

Choosing a learning rate

I'm currently working on implementing Stochastic Gradient Descent, SGD, for neural nets using back-propagation, and while I understand its purpose I have some questions about how to choose values for the learning rate. Is the learning rate related…

asked Jun 16 '14 at 18:08

ragingSloth

1,824
3
14
15

112

votes

9 answers

When should I use Gini Impurity as opposed to Information Gain (Entropy)?

Can someone practically explain the rationale behind Gini impurity vs Information gain (based on Entropy)? Which metric is better to use in different scenarios while using decision trees?

asked Feb 12 '16 at 22:05

Krish Mahajan

1,221
2
9
4

111

votes

5 answers

Backprop Through Max-Pooling Layers?

This is a small conceptual question that's been nagging me for a while: How can we back-propagate through a max-pooling layer in a neural network? I came across max-pooling layers while going through this tutorial for Torch 7's nn library. The…

asked May 12 '16 at 08:38

shinvu

1,240
2
9
7

104

votes

4 answers

What is the positional encoding in the transformer model?

I'm trying to read and understand the paper Attention is all you need and in it, there is a picture: I don't know what positional encoding is. by listening to some youtube videos I've found out that it is an embedding having both meaning and…

asked Apr 28 '19 at 14:43

Peyman

1,143
2
8
8

98

votes

4 answers

Advantages of AUC vs standard accuracy

I was starting to look into area under curve(AUC) and am a little confused about its usefulness. When first explained to me, AUC seemed to be a great measure of performance but in my research I've found that some claim its advantage is mostly…

asked Jul 22 '14 at 03:43

aidankmcl

1,083
1
8
6

95

votes

10 answers

ValueError: Input contains NaN, infinity or a value too large for dtype('float32')

I got ValueError when predicting test data using a RandomForest model. My code: clf = RandomForestClassifier(n_estimators=10, max_depth=6, n_jobs=1, verbose=2) clf.fit(X_fit, y_fit) df_test.fillna(df_test.mean()) X_test = df_test.values y_pred =…

asked May 26 '16 at 04:13

Edamame

2,745
5
24
33

94

votes

12 answers

How big is big data?

Lots of people use the term big data in a rather commercial way, as a means of indicating that large datasets are involved in the computation, and therefore potential solutions must have good performance. Of course, big data always carry associated…

asked May 14 '14 at 03:56

Rubens

4,107
5
23
42

90

votes

7 answers

In supervised learning, why is it bad to have correlated features?

I read somewhere that if we have features that are too correlated, we have to remove one, as this may worsen the model. It is clear that correlated features means that they bring the same information, so it is logical to remove one of them. But I…

asked Nov 07 '17 at 14:37

Spider

1,279
1
12
12

89

votes

1 answer

When to use (He or Glorot) normal initialization over uniform init? And what are its effects with Batch Normalization?

I knew that Residual Network (ResNet) made He normal initialization popular. In ResNet, He normal initialization is used , while the first layer uses He uniform initialization. I've looked through ResNet paper and "Delving Deep into Rectifiers"…

asked Jul 28 '16 at 17:12

Rizky Luthfianto

2,206
2
19
22

Most Popular