Most Popular

1500 questions
38
votes
4 answers

Do Random Forest overfit?

I have been reading around about Random Forests but I cannot really find a definitive answer about the problem of overfitting. According to the original paper of Breiman, they should not overfit when increasing the number of trees in the forest, but…
papafe
  • 595
  • 1
  • 5
  • 9
37
votes
4 answers

Meaning of latent features?

I am learning about matrix factorization for recommender systems and I am seeing the term latent features occurring too frequently but I am unable to understand what it means. I know what a feature is but I don't understand the idea of latent…
Jack Twain
  • 719
  • 1
  • 5
  • 7
37
votes
7 answers

Organized processes to clean data

From my limited dabbling with data science using R, I realized that cleaning bad data is a very important part of preparing data for analysis. Are there any best practices or processes for cleaning data before processing it? If so, are there any…
Jay Godse
  • 471
  • 5
  • 7
37
votes
13 answers

What do you think of Data Science certifications?

I've now seen two data science certification programs - the John Hopkins one available at Coursera and the Cloudera one. I'm sure there are others out there. The John Hopkins set of classes is focused on R as a toolset, but covers a range of…
Steve Kallestad
  • 3,128
  • 4
  • 21
  • 39
37
votes
6 answers

Sentence similarity prediction

I'm looking to solve the following problem: I have a set of sentences as my dataset, and I want to be able to type a new sentence, and find the sentence that the new one is the most similar to in the dataset. An example would look like: New…
lte__
  • 1,320
  • 5
  • 18
  • 27
37
votes
2 answers

How to use the output of GridSearch?

I'm currently working with Python and Scikit learn for classification purposes, and doing some reading around GridSearch I thought this was a great way for optimising my estimator parameters to get the best results. My methodology is this: Split my…
Dan Carter
  • 1,732
  • 1
  • 11
  • 26
37
votes
1 answer

RNN's with multiple features

I have a bit of self taught knowledge working with Machine Learning algorithms (the basic Random Forest and Linear Regression type stuff). I decided to branch out and begin learning RNN's with Keras. When looking at most of the examples, which…
Rjay155
  • 1,215
  • 2
  • 12
  • 9
36
votes
4 answers

What is a good way to transform Cyclic Ordinal attributes?

I am having 'hour' field as my attribute, but it takes a cyclic values. How could I transform the feature to preserve the information like '23' and '0' hour are close not far. One way I could think is to do transformation: min(h, 23-h) Input: [0 1…
Mangat Rai Modi
  • 569
  • 1
  • 5
  • 11
36
votes
3 answers

How to disable GPU with TensorFlow?

Using tensorflow-gpu 2.0.0rc0. I want to choose whether it uses the GPU or the CPU.
Florin Andrei
  • 1,120
  • 1
  • 9
  • 13
36
votes
5 answers

What to set in steps_per_epoch in Keras' fit_generator?

I am replicating, in Keras, the work of a paper where I know the values of epoch and batch_size. Since the dataset is quite large, I am using fit_generator. I would like to know what to set in steps_per_epoch given epoch value and batch_size. Is…
yamini goel
  • 731
  • 3
  • 7
  • 14
36
votes
1 answer

Time Series prediction using LSTMs: Importance of making time series stationary

In this link on Stationarity and differencing, it has been mentioned that models like ARIMA require a stationarized time series for forecasting as it's statistical properties like mean, variance, autocorrelation etc are constant over time. Since…
PixelPioneer
  • 795
  • 2
  • 9
  • 10
36
votes
6 answers

How do I load FastText pretrained model with Gensim?

I tried to load fastText pretrained model from here Fasttext model. I am using wiki.simple.en from gensim.models.keyedvectors import KeyedVectors word_vectors = KeyedVectors.load_word2vec_format('wiki.simple.bin', binary=True) But, it shows the…
Sabbiu Shah
  • 753
  • 1
  • 6
  • 9
36
votes
6 answers

Why do convolutional neural networks work?

I have often heard people saying that why convolutional neural networks are still poorly understood. Is it known why convolutional neural networks always end up learning increasingly sophisticated features as we go up the layers? What caused them…
Praise the lord
  • 461
  • 1
  • 4
  • 5
36
votes
1 answer

Paper: What's the difference between Layer Normalization, Recurrent Batch Normalization (2016), and Batch Normalized RNN (2015)?

So, recently there's a Layer Normalization paper. There's also an implementation of it on Keras. But I remember there are papers titled Recurrent Batch Normalization (Cooijmans, 2016) and Batch Normalized Recurrent Neural Networks (Laurent, 2015).…
Rizky Luthfianto
  • 2,206
  • 2
  • 19
  • 22
36
votes
6 answers

How to do SVD and PCA with big data?

I have a large set of data (about 8GB). I would like to use machine learning to analyze it. So, I think that I should use SVD then PCA to reduce the data dimensionality for efficiency. However, MATLAB and Octave cannot load such a large…
David S.
  • 547
  • 2
  • 6
  • 8