Highest Voted Questions - Data Science Stack Exchange

36

votes

1 answer

Why is xgboost so much faster than sklearn GradientBoostingClassifier?

I'm trying to train a gradient boosting model over 50k examples with 100 numeric features. XGBClassifier handles 500 trees within 43 seconds on my machine, while GradientBoostingClassifier handles only 10 trees(!) in 1 minutes and 2 seconds :( I…

asked Mar 29 '16 at 14:14

ihadanny

1,357
2
11
19

35

votes

3 answers

xgboost: give more importance to recent samples

Is there a way to add more importance to points which are more recent when analyzing data with xgboost?

asked Dec 22 '15 at 17:19

kilojoules

453
1
4
6

35

votes

9 answers

Why is it wrong to train and test a model on the same dataset?

What are the pitfalls of doing so and why is it a bad practice? Is it possible that the model starts to learn the images "by heart" instead of understanding the underlying logic?

asked Dec 13 '20 at 14:11

karalis1

461
1
5
8

35

votes

3 answers

Does modeling with Random Forests require cross-validation?

As far as I've seen, opinions tend to differ about this. Best practice would certainly dictate using cross-validation (especially if comparing RFs with other algorithms on the same dataset). On the other hand, the original source states that the…

asked Jul 20 '15 at 13:42

neuron

664
1
6
9

35

votes

2 answers

What is/are the default filters used by Keras Convolution2d()?

I am pretty new to neural networks, but I understand linear algebra and the mathematics of convolution pretty decently. I am trying to understand the example code I find in various places on the net for training a Keras convolutional NN with MNIST…

asked Jan 23 '17 at 08:07

ChrisFal

453
1
4
5

35

votes

6 answers

Merging multiple data frames row-wise in PySpark

I have 10 data frames pyspark.sql.dataframe.DataFrame, obtained from randomSplit as (td1, td2, td3, td4, td5, td6, td7, td8, td9, td10) = td.randomSplit([.1, .1, .1, .1, .1, .1, .1, .1, .1, .1], seed = 100) Now I want to join 9 td's into a single…

asked Apr 22 '16 at 04:27

krishna Prasad

1,147
1
14
23

35

votes

4 answers

Quick guide into training highly imbalanced data sets

I have a classification problem with approximately 1000 positive and 10000 negative samples in training set. So this data set is quite unbalanced. Plain random forest is just trying to mark all test samples as a majority class. Some good answers…

asked Sep 12 '14 at 15:20

IgorS

5,474
11
31
43

35

votes

1 answer

What is the best Keras model for multi-class classification?

I am working on research, where need to classify one of three event WINNER=(win, draw, lose) WINNER LEAGUE HOME AWAY MATCH_HOME MATCH_DRAW MATCH_AWAY MATCH_U2_50 MATCH_O2_50 3 13 550 571 1.86 3.34 …

asked Feb 01 '16 at 15:18

SpanishBoy

557
1
5
11

34

votes

3 answers

Is it necessary to normalize data for XGBoost?

MinMaxScaler() in scikit-learn is used for data normalization (a.k.a feature scaling). Data normalization is not necessary for decision trees. Since XGBoost is based on decision trees, is it necessary to do data normalization using MinMaxScaler()…

asked Sep 28 '19 at 13:35

user781486

1,385
2
16
19

34

votes

4 answers

Gumbel-Softmax trick vs Softmax with temperature

From what I understand, the Gumbel-Softmax trick is a technique that enables us to sample discrete random variables, in a way that is differentiable (and therefore suited for end-to-end deep learning). Many papers and articles describe it as a way…

asked Aug 29 '19 at 10:30

4-bit

441
1
4
3

34

votes

3 answers

What's the difference between Attention vs Self-Attention? What problems does each other solve that the other can't?

As stated in the question above..is there a difference between attention and self attention mechanism ? Also additionally can anybody share with me tips and tricks about how self attention mechanism can be implemented in CNN?

asked Apr 17 '19 at 10:39

Pratik.S

473
1
4
9

34

votes

5 answers

What are the use cases for Apache Spark vs Hadoop

With Hadoop 2.0 and YARN Hadoop is supposedly no longer tied only map-reduce solutions. With that advancement, what are the use cases for Apache Spark vs Hadoop considering both sit atop of HDFS? I've read through the introduction documentation for…

asked Jun 17 '14 at 20:48

idclark

521
1
5
6

34

votes

4 answers

How to use LeakyRelu as activation function in sequence DNN in keras?When it perfoms better than Relu?

How do you use LeakyRelu as an activation function in sequence DNN in keras? If I want to write something similar to: model = Sequential() model.add(Dense(90, activation='LeakyRelu')) What is the solution? Put LeakyRelu similar to Relu? Second…

asked Oct 02 '18 at 04:06

user10296606

1,834
5
17
31

34

votes

4 answers

When to use cosine simlarity over Euclidean similarity

In NLP, people tend to use cosine similarity to measure document/text distances. I want to hear what do people think of the following two scenarios, which to pick, cosine similarity or Euclidean? Overview of the task set: The task is to compute…

asked Feb 12 '18 at 13:31

Logan

463
1
4
8

34

votes

2 answers

Are there any rules for choosing the size of a mini-batch?

When training neural networks, one hyperparameter is the size of a minibatch. Common choices are 32, 64, and 128 elements per mini batch. Are there any rules/guidelines on how big a mini-batch should be? Or any publications which investigate the…

asked Apr 17 '17 at 16:18

Martin Thoma

18,880
35
95
169

Most Popular