Most Popular
1500 questions
36
votes
1 answer
Why is xgboost so much faster than sklearn GradientBoostingClassifier?
I'm trying to train a gradient boosting model over 50k examples with 100 numeric features. XGBClassifier handles 500 trees within 43 seconds on my machine, while GradientBoostingClassifier handles only 10 trees(!) in 1 minutes and 2 seconds :( I…

ihadanny
- 1,357
- 2
- 11
- 19
35
votes
3 answers
xgboost: give more importance to recent samples
Is there a way to add more importance to points which are more recent when analyzing data with xgboost?

kilojoules
- 453
- 1
- 4
- 6
35
votes
9 answers
Why is it wrong to train and test a model on the same dataset?
What are the pitfalls of doing so and why is it a bad practice? Is it possible that the model starts to learn the images "by heart" instead of understanding the underlying logic?

karalis1
- 461
- 1
- 5
- 8
35
votes
3 answers
Does modeling with Random Forests require cross-validation?
As far as I've seen, opinions tend to differ about this. Best practice would certainly dictate using cross-validation (especially if comparing RFs with other algorithms on the same dataset). On the other hand, the original source states that the…

neuron
- 664
- 1
- 6
- 9
35
votes
2 answers
What is/are the default filters used by Keras Convolution2d()?
I am pretty new to neural networks, but I understand linear algebra and the mathematics of convolution pretty decently.
I am trying to understand the example code I find in various places on the net for training a Keras convolutional NN with MNIST…

ChrisFal
- 453
- 1
- 4
- 5
35
votes
6 answers
Merging multiple data frames row-wise in PySpark
I have 10 data frames pyspark.sql.dataframe.DataFrame, obtained from randomSplit as (td1, td2, td3, td4, td5, td6, td7, td8, td9, td10) = td.randomSplit([.1, .1, .1, .1, .1, .1, .1, .1, .1, .1], seed = 100) Now I want to join 9 td's into a single…

krishna Prasad
- 1,147
- 1
- 14
- 23
35
votes
4 answers
Quick guide into training highly imbalanced data sets
I have a classification problem with approximately 1000 positive and 10000 negative samples in training set. So this data set is quite unbalanced. Plain random forest is just trying to mark all test samples as a majority class.
Some good answers…

IgorS
- 5,474
- 11
- 31
- 43
35
votes
1 answer
What is the best Keras model for multi-class classification?
I am working on research, where need to classify one of three event WINNER=(win, draw, lose)
WINNER LEAGUE HOME AWAY MATCH_HOME MATCH_DRAW MATCH_AWAY MATCH_U2_50 MATCH_O2_50
3 13 550 571 1.86 3.34 …

SpanishBoy
- 557
- 1
- 5
- 11
34
votes
3 answers
Is it necessary to normalize data for XGBoost?
MinMaxScaler() in scikit-learn is used for data normalization (a.k.a feature scaling). Data normalization is not necessary for decision trees. Since XGBoost is based on decision trees, is it necessary to do data normalization using MinMaxScaler()…

user781486
- 1,385
- 2
- 16
- 19
34
votes
4 answers
Gumbel-Softmax trick vs Softmax with temperature
From what I understand, the Gumbel-Softmax trick is a technique that enables us to sample discrete random variables, in a way that is differentiable (and therefore suited for end-to-end deep learning).
Many papers and articles describe it as a way…

4-bit
- 441
- 1
- 4
- 3
34
votes
3 answers
What's the difference between Attention vs Self-Attention? What problems does each other solve that the other can't?
As stated in the question above..is there a difference between attention and self attention mechanism ? Also additionally can anybody share with me tips and tricks about how self attention mechanism can be implemented in CNN?

Pratik.S
- 473
- 1
- 4
- 9
34
votes
5 answers
What are the use cases for Apache Spark vs Hadoop
With Hadoop 2.0 and YARN Hadoop is supposedly no longer tied only map-reduce solutions. With that advancement, what are the use cases for Apache Spark vs Hadoop considering both sit atop of HDFS? I've read through the introduction documentation for…

idclark
- 521
- 1
- 5
- 6
34
votes
4 answers
How to use LeakyRelu as activation function in sequence DNN in keras?When it perfoms better than Relu?
How do you use LeakyRelu as an activation function in sequence DNN in keras?
If I want to write something similar to:
model = Sequential()
model.add(Dense(90, activation='LeakyRelu'))
What is the solution? Put LeakyRelu similar to Relu?
Second…

user10296606
- 1,834
- 5
- 17
- 31
34
votes
4 answers
When to use cosine simlarity over Euclidean similarity
In NLP, people tend to use cosine similarity to measure document/text distances. I want to hear what do people think of the following two scenarios, which to pick, cosine similarity or Euclidean?
Overview of the task set: The task is to compute…

Logan
- 463
- 1
- 4
- 8
34
votes
2 answers
Are there any rules for choosing the size of a mini-batch?
When training neural networks, one hyperparameter is the size of a minibatch. Common choices are 32, 64, and 128 elements per mini batch.
Are there any rules/guidelines on how big a mini-batch should be? Or any publications which investigate the…

Martin Thoma
- 18,880
- 35
- 95
- 169