Most Popular

1500 questions
147
votes
13 answers

Why do people prefer Pandas to SQL?

I've been using SQL since 1996, so I may be biased. I've used MySQL and SQLite 3 extensively, but have also used Microsoft SQL Server and Oracle. The vast majority of the operations I've seen done with Pandas can be done more easily with SQL. This…
vy32
  • 601
  • 3
  • 7
  • 10
132
votes
1 answer

How to get correlation between two categorical variable and a categorical variable and continuous variable?

I am building a regression model and I need to calculate the below to check for correlations Correlation between 2 Multi level categorical variables Correlation between a Multi level categorical variable and continuous variable VIF(variance…
GeorgeOfTheRF
  • 2,028
  • 5
  • 17
  • 20
127
votes
14 answers

Python vs R for machine learning

I'm just starting to develop a machine learning application for academic purposes. I'm currently using R and training myself in it. However, in a lot of places, I have seen people using Python. What are people using in academia and industry, and…
user721
  • 159
  • 2
  • 3
  • 3
124
votes
2 answers

Training an RNN with examples of different lengths in Keras

I am trying to get started learning about RNNs and I'm using Keras. I understand the basic premise of vanilla RNN and LSTM layers, but I'm having trouble understanding a certain technical point for training. In the keras documentation, it says the…
Tac-Tics
  • 1,360
  • 2
  • 9
  • 6
117
votes
12 answers

SVM using scikit learn runs endlessly and never completes execution

I am trying to run SVR using scikit-learn (python) on a training dataset that has 595605 rows and 5 columns (features) while the test dataset has 397070 rows. The data has been pre-processed and regularized. I am able to successfully run the test…
tejaskhot
  • 4,065
  • 7
  • 20
  • 18
115
votes
5 answers

Why do cost functions use the square error?

I'm just getting started with some machine learning, and until now I have been dealing with linear regression over one variable. I have learnt that there is a hypothesis, which is: $h_\theta(x)=\theta_0+\theta_1x$ To find out good values for the…
Golo Roden
  • 1,323
  • 2
  • 9
  • 6
114
votes
11 answers

Choosing a learning rate

I'm currently working on implementing Stochastic Gradient Descent, SGD, for neural nets using back-propagation, and while I understand its purpose I have some questions about how to choose values for the learning rate. Is the learning rate related…
ragingSloth
  • 1,824
  • 3
  • 14
  • 15
112
votes
9 answers

When should I use Gini Impurity as opposed to Information Gain (Entropy)?

Can someone practically explain the rationale behind Gini impurity vs Information gain (based on Entropy)? Which metric is better to use in different scenarios while using decision trees?
Krish Mahajan
  • 1,221
  • 2
  • 9
  • 4
111
votes
5 answers

Backprop Through Max-Pooling Layers?

This is a small conceptual question that's been nagging me for a while: How can we back-propagate through a max-pooling layer in a neural network? I came across max-pooling layers while going through this tutorial for Torch 7's nn library. The…
shinvu
  • 1,240
  • 2
  • 9
  • 7
104
votes
4 answers

What is the positional encoding in the transformer model?

I'm trying to read and understand the paper Attention is all you need and in it, there is a picture: I don't know what positional encoding is. by listening to some youtube videos I've found out that it is an embedding having both meaning and…
Peyman
  • 1,143
  • 2
  • 8
  • 8
98
votes
4 answers

Advantages of AUC vs standard accuracy

I was starting to look into area under curve(AUC) and am a little confused about its usefulness. When first explained to me, AUC seemed to be a great measure of performance but in my research I've found that some claim its advantage is mostly…
aidankmcl
  • 1,083
  • 1
  • 8
  • 6
95
votes
10 answers

ValueError: Input contains NaN, infinity or a value too large for dtype('float32')

I got ValueError when predicting test data using a RandomForest model. My code: clf = RandomForestClassifier(n_estimators=10, max_depth=6, n_jobs=1, verbose=2) clf.fit(X_fit, y_fit) df_test.fillna(df_test.mean()) X_test = df_test.values y_pred =…
Edamame
  • 2,745
  • 5
  • 24
  • 33
94
votes
12 answers

How big is big data?

Lots of people use the term big data in a rather commercial way, as a means of indicating that large datasets are involved in the computation, and therefore potential solutions must have good performance. Of course, big data always carry associated…
Rubens
  • 4,107
  • 5
  • 23
  • 42
90
votes
7 answers

In supervised learning, why is it bad to have correlated features?

I read somewhere that if we have features that are too correlated, we have to remove one, as this may worsen the model. It is clear that correlated features means that they bring the same information, so it is logical to remove one of them. But I…
Spider
  • 1,279
  • 1
  • 12
  • 12
89
votes
1 answer

When to use (He or Glorot) normal initialization over uniform init? And what are its effects with Batch Normalization?

I knew that Residual Network (ResNet) made He normal initialization popular. In ResNet, He normal initialization is used , while the first layer uses He uniform initialization. I've looked through ResNet paper and "Delving Deep into Rectifiers"…
Rizky Luthfianto
  • 2,206
  • 2
  • 19
  • 22