Most Popular

1500 questions
69
votes
4 answers

What is the use of torch.no_grad in pytorch?

I am new to pytorch and started with this github code. I do not understand the comment in line 60-61 in the code "because weights have requires_grad=True, but we don't need to track this in autograd". I understood that we mention requires_grad=True…
mausamsion
  • 1,282
  • 1
  • 10
  • 14
69
votes
11 answers

Why should the data be shuffled for machine learning tasks

In machine learning tasks it is common to shuffle data and normalize it. The purpose of normalization is clear (for having same range of feature values). But, after struggling a lot, I did not find any valuable reason for shuffling data. I have read…
Green Falcon
  • 14,058
  • 9
  • 57
  • 98
69
votes
5 answers

Adding Features To Time Series Model LSTM

have been reading up a bit on LSTM's and their use for time series and its been interesting but difficult at the same time. One thing I have had difficulties with understanding is the approach to adding additional features to what is already a list…
Rjay155
  • 1,215
  • 2
  • 12
  • 9
68
votes
6 answers

What is the Q function and what is the V function in reinforcement learning?

It seems to me that the $V$ function can be easily expressed by the $Q$ function and thus the $V$ function seems to be superfluous to me. However, I'm new to reinforcement learning so I guess I got something wrong. Definitions Q- and V-learning are…
Martin Thoma
  • 18,880
  • 35
  • 95
  • 169
67
votes
9 answers

Clustering geo location coordinates (lat,long pairs)

What is the right approach and clustering algorithm for geolocation clustering? I'm using the following code to cluster geolocation coordinates: import numpy as np import matplotlib.pyplot as plt from scipy.cluster.vq import kmeans2,…
rokpoto.com
  • 813
  • 1
  • 7
  • 6
67
votes
2 answers

Sparse_categorical_crossentropy vs categorical_crossentropy (keras, accuracy)

Which is better for accuracy or are they the same? Of course, if you use categorical_crossentropy you use one hot encoding, and if you use sparse_categorical_crossentropy you encode as normal integers. Additionally, when is one better than the…
Master M
  • 773
  • 1
  • 6
  • 5
67
votes
5 answers

In softmax classifier, why use exp function to do normalization?

Why use softmax as opposed to standard normalization? In the comment area of the top answer of this question, @Kilian Batzner raised 2 questions which also confuse me a lot. It seems no one gives an explanation except numerical benefits. I get the…
Hans
  • 773
  • 1
  • 6
  • 5
67
votes
4 answers

Why mini batch size is better than one single "batch" with all training data?

I often read that in case of Deep Learning models the usual practice is to apply mini batches (generally a small one, 32/64) over several training epochs. I cannot really fathom the reason behind this. Unless I'm mistaken, the batch size is the…
Hendrik
  • 8,587
  • 17
  • 42
  • 55
65
votes
5 answers

How to get accuracy, F1, precision and recall, for a keras model?

I want to compute the precision, recall and F1-score for my binary KerasClassifier model, but don't find any solution. Here's my actual code: # Split dataset in train and test data X_train, X_test, Y_train, Y_test = train_test_split(normalized_X,…
ZelelB
  • 1,057
  • 2
  • 11
  • 14
65
votes
6 answers

When is a Model Underfitted?

Logic often states that by underfitting a model, it's capacity to generalize is increased. That said, clearly at some point underfitting a model cause models to become worse regardless of the complexity of data. How do you know when your model has…
blunders
  • 1,932
  • 2
  • 15
  • 19
64
votes
4 answers

Does batch_size in Keras have any effects in results' quality?

I am about to train a big LSTM network with 2-3 million articles and am struggling with Memory Errors (I use AWS EC2 g2x2large). I found out that one solution is to reduce the batch_size. However, I am not sure if this parameter is only related to…
hipoglucido
  • 1,170
  • 1
  • 10
  • 17
63
votes
9 answers

Tools and protocol for reproducible data science using Python

I am working on a data science project using Python. The project has several stages. Each stage comprises of taking a data set, using Python scripts, auxiliary data, configuration and parameters, and creating another data set. I store the code in…
Yuval F
  • 761
  • 1
  • 6
  • 7
63
votes
10 answers

Machine learning - features engineering from date/time data

What are the common/best practices to handle time data for machine learning application? For example, if in data set there is a column with timestamp of event, such as "2014-05-05", how you can extract useful features from this column if any? Thanks…
Igor Bobriakov
  • 1,071
  • 2
  • 9
  • 11
63
votes
6 answers

Should a model be re-trained if new observations are available?

So, I have not been able to find any literature on this subject but it seems like something worth giving a thought: What are the best practices in model training and optimization if new observations are available? Is there any way to determine the…
neural-nut
  • 1,783
  • 3
  • 17
  • 27
62
votes
6 answers

Latent Dirichlet Allocation vs Hierarchical Dirichlet Process

Latent Dirichlet Allocation (LDA) and Hierarchical Dirichlet Process (HDP) are both topic modeling processes. The major difference is LDA requires the specification of the number of topics, and HDP doesn't. Why is that so? And what are the…
alvas
  • 2,410
  • 7
  • 25
  • 40