Highest Voted Questions - Data Science Stack Exchange

62

votes

5 answers

RNN vs CNN at a high level

I've been thinking about the Recurrent Neural Networks (RNN) and their varieties and Convolutional Neural Networks (CNN) and their varieties. Would these two points be fair to say: Use CNNs to break a component (such as an image) into subcomponents…

asked May 06 '16 at 14:36

Larry Freeman

745
1
6
8

61

votes

10 answers

IDE alternatives for R programming (RStudio, IntelliJ IDEA, Eclipse, Visual Studio)

I use RStudio for R programming. I remember about solid IDE-s from other technology stacks, like Visual Studio or Eclipse. I have two questions: What other IDE-s than RStudio are used (please consider providing some brief description on them). Does…

asked Mar 18 '15 at 11:39

IgorS

5,474
11
31
43

61

votes

10 answers

How to deal with version control of large amounts of (binary) data

I am a PhD student of Geophysics and work with large amounts of image data (hundreds of GB, tens of thousands of files). I know svn and git fairly well and come to value a project history, combined with the ability to easily work together and have…

asked Feb 13 '15 at 10:09

Johann

721
1
5
5

60

votes

5 answers

Neural networks: which cost function to use?

I am using TensorFlow for experiments mainly with neural networks. Although I have done quite some experiments (XOR-Problem, MNIST, some Regression stuff, ...) now, I struggle with choosing the "correct" cost function for specific problems because…

asked Jan 19 '16 at 11:48

daniel451

723
1
6
6

60

votes

9 answers

Is there any domain where Bayesian Networks outperform neural networks?

Neural networks get top results in Computer Vision tasks (see MNIST, ILSVRC, Kaggle Galaxy Challenge). They seem to outperform every other approach in Computer Vision. But there are also other tasks: Kaggle Molecular Activity Challenge Regression:…

asked Jan 17 '16 at 13:04

Martin Thoma

18,880
35
95
169

60

votes

8 answers

Does scikit-learn have a forward selection/stepwise regression algorithm?

I am working on a problem with too many features and training my models takes way too long. I implemented a forward selection algorithm to choose features. However, I was wondering does scikit-learn have a forward selection/stepwise regression…

asked Aug 07 '14 at 15:33

Maksud

725
1
7
6

60

votes

3 answers

How to fight underfitting in a deep neural net

When I started with artificial neural networks (NN) I thought I'd have to fight overfitting as the main problem. But in practice I can't even get my NN to pass the 20% error rate barrier. I can't even beat my score on random forest! I'm seeking some…

asked Jul 13 '14 at 09:04

lithuak

723
1
6
8

60

votes

2 answers

What is the difference between LeakyReLU and PReLU?

I thought both, PReLU and Leaky ReLU are $$f(x) = \max(x, \alpha x) \qquad \text{ with } \alpha \in (0, 1)$$ Keras, however, has both functions in the docs. Leaky ReLU Source of LeakyReLU: return K.relu(inputs, alpha=self.alpha) Hence (see relu…

neural-network

asked Apr 25 '17 at 11:58

Martin Thoma

18,880
35
95
169

59

votes

6 answers

Should I go for a 'balanced' dataset or a 'representative' dataset?

My 'machine learning' task is of separating benign Internet traffic from malicious traffic. In the real world scenario, most (say 90% or more) of Internet traffic is benign. Thus I felt that I should choose a similar data setup for training my…

asked Jul 22 '14 at 12:29

pnp

693
1
6
10

59

votes

4 answers

Difference between OrdinalEncoder and LabelEncoder

I was going through the official documentation of scikit-learn learn after going through a book on ML and came across the following thing: In the Documentation it is given about sklearn.preprocessing.OrdinalEncoder() whereas in the book it was given…

asked Oct 07 '18 at 18:55

Saurabh Singh

733
1
6
8

59

votes

3 answers

How to set batch_size, steps_per epoch, and validation steps?

I am starting to learn CNNs using Keras. I am using the theano backend. I don't understand how to set values to: batch_size steps_per_epoch validation_steps What should be the value set to batch_size, steps_per_epoch, and validation_steps, if I…

asked Mar 30 '18 at 06:53

Ermene

693
1
6
6

59

votes

6 answers

Does XGBoost handle multicollinearity by itself?

I'm currently using XGBoost on a data-set with 21 features (selected from list of some 150 features), then one-hot coded them to obtain ~98 features. A few of these 98 features are somewhat redundant, for example: a variable (feature) $A$ also…

asked Jul 02 '16 at 07:30

neural-nut

1,783
3
17
27

59

votes

5 answers

Number of parameters in an LSTM model

How many parameters does a single stacked LSTM have? The number of parameters imposes a lower bound on the number of training examples required and also influences the training time. Hence knowing the number of parameters is useful for training…

asked Mar 09 '16 at 11:14

wabbit

1,297
2
12
15

57

votes

8 answers

Why Is Overfitting Bad in Machine Learning?

Logic often states that by overfitting a model, its capacity to generalize is limited, though this might only mean that overfitting stops a model from improving after a certain complexity. Does overfitting cause models to become worse regardless of…

asked May 14 '14 at 18:09

blunders

1,932
2
15
19

57

votes

4 answers

What is the advantage of keeping batch size a power of 2?

While training models in machine learning, why is it sometimes advantageous to keep the batch size to a power of 2? I thought it would be best to use a size that is the largest fit in your GPU memory / RAM. This answer claims that for some packages,…

asked Jul 05 '17 at 05:43

James Bond

1,195
2
11
12

Most Popular