Highest Voted Questions - Data Science Stack Exchange

38

votes

4 answers

Do Random Forest overfit?

I have been reading around about Random Forests but I cannot really find a definitive answer about the problem of overfitting. According to the original paper of Breiman, they should not overfit when increasing the number of trees in the forest, but…

asked Aug 23 '14 at 16:54

papafe

595
1
5
9

37

votes

4 answers

Meaning of latent features?

I am learning about matrix factorization for recommender systems and I am seeing the term latent features occurring too frequently but I am unable to understand what it means. I know what a feature is but I don't understand the idea of latent…

asked Jul 16 '14 at 09:24

Jack Twain

719
1
5
7

37

votes

7 answers

Organized processes to clean data

From my limited dabbling with data science using R, I realized that cleaning bad data is a very important part of preparing data for analysis. Are there any best practices or processes for cleaning data before processing it? If so, are there any…

asked May 14 '14 at 15:25

Jay Godse

471
5
7

37

votes

13 answers

What do you think of Data Science certifications?

I've now seen two data science certification programs - the John Hopkins one available at Coursera and the Cloudera one. I'm sure there are others out there. The John Hopkins set of classes is focused on R as a toolset, but covers a range of…

education

asked Jun 12 '14 at 10:52

Steve Kallestad

3,128
4
21
39

37

votes

6 answers

Sentence similarity prediction

I'm looking to solve the following problem: I have a set of sentences as my dataset, and I want to be able to type a new sentence, and find the sentence that the new one is the most similar to in the dataset. An example would look like: New…

asked Oct 22 '17 at 07:36

lte__

1,320
5
18
27

37

votes

2 answers

How to use the output of GridSearch?

I'm currently working with Python and Scikit learn for classification purposes, and doing some reading around GridSearch I thought this was a great way for optimising my estimator parameters to get the best results. My methodology is this: Split my…

asked Aug 01 '17 at 13:20

Dan Carter

1,732
1
11
26

37

votes

1 answer

RNN's with multiple features

I have a bit of self taught knowledge working with Machine Learning algorithms (the basic Random Forest and Linear Regression type stuff). I decided to branch out and begin learning RNN's with Keras. When looking at most of the examples, which…

asked Feb 16 '17 at 19:35

Rjay155

1,215
2
12
9

36

votes

4 answers

What is a good way to transform Cyclic Ordinal attributes?

I am having 'hour' field as my attribute, but it takes a cyclic values. How could I transform the feature to preserve the information like '23' and '0' hour are close not far. One way I could think is to do transformation: min(h, 23-h) Input: [0 1…

asked Jun 03 '15 at 05:56

Mangat Rai Modi

569
1
5
11

36

votes

3 answers

How to disable GPU with TensorFlow?

Using tensorflow-gpu 2.0.0rc0. I want to choose whether it uses the GPU or the CPU.

asked Sep 07 '19 at 21:14

Florin Andrei

1,120
1
9
13

36

votes

5 answers

What to set in steps_per_epoch in Keras' fit_generator?

I am replicating, in Keras, the work of a paper where I know the values of epoch and batch_size. Since the dataset is quite large, I am using fit_generator. I would like to know what to set in steps_per_epoch given epoch value and batch_size. Is…

asked Mar 16 '19 at 10:25

yamini goel

731
3
7
14

36

votes

1 answer

Time Series prediction using LSTMs: Importance of making time series stationary

In this link on Stationarity and differencing, it has been mentioned that models like ARIMA require a stationarized time series for forecasting as it's statistical properties like mean, variance, autocorrelation etc are constant over time. Since…

asked Nov 16 '17 at 07:57

PixelPioneer

795
2
9
10

36

votes

6 answers

How do I load FastText pretrained model with Gensim?

I tried to load fastText pretrained model from here Fasttext model. I am using wiki.simple.en from gensim.models.keyedvectors import KeyedVectors word_vectors = KeyedVectors.load_word2vec_format('wiki.simple.bin', binary=True) But, it shows the…

asked Jun 30 '17 at 02:14

Sabbiu Shah

753
1
6
9

36

votes

6 answers

Why do convolutional neural networks work?

I have often heard people saying that why convolutional neural networks are still poorly understood. Is it known why convolutional neural networks always end up learning increasingly sophisticated features as we go up the layers? What caused them…

asked Dec 23 '16 at 12:43

Praise the lord

461
1
4
5

36

votes

1 answer

Paper: What's the difference between Layer Normalization, Recurrent Batch Normalization (2016), and Batch Normalized RNN (2015)?

So, recently there's a Layer Normalization paper. There's also an implementation of it on Keras. But I remember there are papers titled Recurrent Batch Normalization (Cooijmans, 2016) and Batch Normalized Recurrent Neural Networks (Laurent, 2015).…

asked Jul 23 '16 at 09:46

Rizky Luthfianto

2,206
2
19
22

36

votes

6 answers

How to do SVD and PCA with big data?

I have a large set of data (about 8GB). I would like to use machine learning to analyze it. So, I think that I should use SVD then PCA to reduce the data dimensionality for efficiency. However, MATLAB and Octave cannot load such a large…

asked Sep 25 '14 at 08:40

David S.

547
2
6
8

Most Popular